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Preface 

The  desire  to  undertake  research  in  the  field  of  system 
identification  initially  grew  out  of  my  experience  as  a 
practicing  aerospace  engineer  working  principally  on  the 
design  and  analysis  of  stability-and-control  characteristics 
of  vertical  take-off  and  landing  (VTOL)  aircraft.  The 
myriad  problems  encountered  in  this  regard  soon  led  me  to 
graduate  study  in  both  aerospace  and  electrical  engineering, 
chiefly  in  the  areas  of  aeromechanics  and  automatic  control 
theory.  As  a  result,  it  soon  became  increasingly  apparent 
that  the  VTOL  parameter  identification  problem  was  concep¬ 
tually  similar  to  identification  problems  frequently 
occurring  in  various  areas  of  electrical  engineering.  My 
interest  was  further  nurtured  when  subsequent  graduate  work 
in  economics  and  systems  analysis  revealed  still  other 
applications  requiring  identification  of  the  parameters  of 
linear  stochastic  systems. 

The  first  three  chapters  of  this  thesis  should  be  of 
particular  interest  to  the  reader  who  has  a  general  curiosity 
about  the  subject  area,  but  who,  for  one  reason  or  another, 
is  not  particularly  interested  in  the  mathematical  details. 

In  these  chapters  I  have  endeavored  to  introduce  basic 
concepts  and  present  a  non-technical  assessment  of  the 
current  state-of-the-art,  including  a  descriptive  exposition 
of  some  of  the  more  popular  methods  of  system  identification. 
Where  material  has  been  condensed  and  summarized  from  the 
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literature,  it  has  been  my  aim  to  do  so  in  such  a  way  as 
to  cis.ir.tiin  sn  objective,  illuminating  perspective  on  the 
subj  ect . 

My  research  into  the  subject  area  began  with  a  fairly 
extensive  literature  search,  the  results  of  which  are  con¬ 
tained  in  the  bibliography.  The  articles  it  lists  are  keyed 
not  only  by  identification  methods,  but  also  by  general 
headings  directly  related  to  system  identification.  The 
bibliography  is,  I  think,  relatively  complete,  and  should 
serve  as  a  ready  source  for  the  reader  who  wishes  to  investi¬ 
gate  further  certain  areas  of  system  identification. 

I  wish  to  express  my  sincere  gratitude  to  my  thesis 
advisor.  Dr.  David  R.  Barr,  Department  of  Mathematics,  for 
the  many  helpful  suggestions  and  constructive  comments  he 
provided  throughout  this  study.  Finally,  special  thanks  are 
offered  to  my  wife,  Veronica.  Without  her  patience  and 
understanding  this  investigation  could  not  have  been 
completed . 
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Abstract 

This  research  considers  the  problem  of  the  identifica¬ 
tion  of  linear  stochastic  systems.  A  current  state-of-the- 
art  assessment  of  the  general  field  of  system  identification 
is  given,  and  criteria  for  the  classification  and  selection 
of  identification  methods  are  presented  and  discussed. 

Several  of  the  more  popular  identification  methods  from  the 
literature  are  investigated  and  summarized.  A  bibliography 
containing  185  references,  keyed  by  identification  methods 
and  other  relevant  headings,  is  included. 

Using  the  state  variable  formulation  for  a  discrete 
linear  stochastic  system,  a  detailed  exposition  of  a  few 
of  the  on-line  identification  methods  currently  appearing 
in  the  literature  is  presented.  One  such  method,  based  on 
the  autocorrelation  function  of  the  output  measurements,  is 
developed  to  identify  the  state  transition  matrix  and  the 
output  noise  covariance  (vector- input ,  scalar-output  case). 

It  is  shown  that  a  canonic?1  or  phase  variable  system 
representation  can  be  used  to  reduce  the  number  of  unknown 
parameters  requiring  identification.  Finally,  an  on-line 
identification  method  called  Levy's  proper  canonical  form, 
which  is  based  on  the  Kalman  filter  representation  of  the 
system,  is  derived  using  the  innovations  sequence  and  certain 
results  from  optimal  estimation  theory.  It  is  shown  that 
this  identification  method  results  in  still  additional 
advantages  over  the  identification  methods  previously 
developed  . 
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THE  IDENTIFICATION  OF  LINEAR 
STOCHASTIC  SYSTEMS 

I.  Introduction 


Purpose  ( 

The  purpose  of  this  thesis  is  to  present: 

1.  A  survey  of  the  current  state-of-the-art  of  the 
identification  of  linear  stochastic  systems. 

2.  A  classification  of  many  of  the  various  identifica¬ 
tion  methods  currently  appearing  in  the  literature,  in  terms 
of  the  type  of  the  model,  input  signal,  and  optimization 
criterion  used. 

3.  The  relevant  properties  of  the  different  identifica¬ 
tion  methods  that  differentiate  one  from  another,  and  provide 
the  systems  analyst  with  valuable  insight  into  which  identi¬ 
fication  method  to  use  in  a  given  application. 

4.  The  development  of  selective  topics  from  the  litera¬ 
ture  of  system  identification,  including  Mehra's  on-line 
identification  method  and  system  identification  via  adaptive 
Kalman  filtering. 

The  Identification  Prob 1 em 

Very  basically,  the  ident i f i cat  ion  prob 1 em  is  simply 
the  determination  of  the  mathematical  model  that  describes  a 
given  physical  process  (i.e.,  the  structure  and  parameters 
of  the  mathematical  model).  If  the  system  designer  is  only 
interested  in  the  parameter  identification  problem,  he  may 
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choose  to  model  the  process  as  shown  in  Fig.  1.  In  the  model 
depicted  in  Fig.  1,  the  plant  is  that  element  which  relates 
the  system  input  to  the  system  output.  As  such,  the  plant 


Input 

Plant 

Output 

Fig.  1.  Model  for  Parameter  Identification  Only. 

can  be  thought  of  as  consisting  of  the  process  parameters 
interconnected  through  some  mathematical  relationship  such 
that,  for  a  given  system  input,  the  output  states  of  the 
process  are  completely  determined.  Hence,  for  a  discrete¬ 
time  process,  the  coefficients  (constant  or  t ime- varying) 
of  the  difference  equation  relating  system  input  to  output 
may  be  regarded  as  functions  of  the  process  parameters. 

In  many  physical  processes,  some  or  all  of  the  process 
parameters  may  be  unknown,  or  may  be  known  initially  but 
change  stochastically  as  the  process  unfolds.  The  parameter 
identification  prob 1 em  consists  of  the  determination  of  these 
unknown  parameters. 

During  the  past  few  years  an  increasing  number  of  papers 
have  appeared  in  the  literature  on  the  general  subject  of 
system  identification.  These  papers  have  appeared  in  such 
diverse  fields  as  economics  (Refs  105  and  106),  industrial 
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processes  (Refs  107  and  111),  and  biology  (Ref  104). 

Generally,  there  are  two  primary  motivations  for  the  current 
interest  in  system  identification.  First,  the  system  designer 
may  only  be  interested  in  identifying  the  para.aetf.rs  of  a 
physical  process  under  consideration.  Second,  he  may  require 
knowledge  of  the  process  parameters  for  use  in  a  subsequent 
control  engineering  application.  Although  obviously  related, 
these  two  motivations  can  have  distinctly  different  implica¬ 
tions,  which  are  discussed  at  length  in  Chapter  II. 

An  example  of  a  parameter  identification  problem  that 
arises  often  in  connection  with  economic  analysis  work 
concerns  the  Leontief  input-output  model  (Ref  109).  In  this 
model,  the  national  economy  is  broken  down  into  N  industries, 
each  of  which  can  be  thought  of  as  producing  a  certain  output 
product.  In  general,  the  output  product  of  each  industry  is 
used  as  an  input,  along  with  certain  other  inputs  (e.g.,  raw 
materials),  to  the  other  industries  in  the  economy.  An 
input-output  table  is  then  constructed.  The  elements,  a.^  , 

i,  j  *  1,  ...»  N,  appearing  in  the  table  are  called  input- 
output  coefficients,  and  refer  to  the  amount  of  input  i 
required  by  industry  j  to  produce  one  unit  of  output  product 

j.  The  identification  problem  consists  of  the  determination 
of  the  input-output  coefficients  necessary  to  sustain  a  given 
level  of  demand  for  the  various  products  in  the  economy. 

On  the  other  hand,  if  the  system  designer  is  interested 
in  the  identification  of  process  parameters  for  subsequent 
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use  in  a  control  engineering  application,  he  may  choose  to 
model  the  process  as  shown  in  Fig.  2.  The  model  shown  in 
Fig.  2  differs  from  that  shown  in  Fig.  1  due  to  the  addition 
of  a  feedback  control  loop.  As  shown,  the  feedback  loop 


Fig.  2.  Model  for  Parameter  Identification  Used 
for  Control  Application. 

contains  an  element  that  operates  on  the  output  signal  to 
yield  a  signal  that  is  compared  to  the  input  signal.  The 
difference  between  these  two  signals  is  then  used  as  the 
effective  input  signal  to  the  plant.  If  the  design  problem 
under  consideration  is  one  of  control  system  synthesis,  the 
system  designer  must  determine  the  structure  and  parameters 
of  the  unknown  feedback  element  that  will  enable  the  system 
output  to  follow,  as  closely  as  possible,  some  desired  signal 
The  design  of  such  a  control  strategy  for  a  given  process 
not  only  requires  knowledge  of  the  structure  and  parameters 
of  the  process  to  be  controlled,  but  also  knowledge  of  the 
various  process  states  and  how  they  change  with  time  (the 
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estimation  problem) .  However,  in  order  to  solve  the  estima¬ 
tion  problem,  one  must  first  postulate  a  structure  for  the 
process  model,  and  then  identify  the  corresponding  parameters 
of  the  model . 

As  an  example  of  a  situation  in  which  parameter  identifi¬ 
cation  is  Acquired  for  use  in  a  control  engineering  applica¬ 
tion,  consider  the  problem  of  designing  a  stability  augmenta¬ 
tion  system  for  a  hovering  vertical  take-off  and  landing 
(VTOL)  aircraft  (Ref  108).  If  the  process  under  consideration 
is  the  VTOL  aircraft  hovering  in  still  air,  it  is  fairly 
straightforward  to  calculate  the  process  parameters  (i.e., 
aircraft  stabi 1 ity-and-control  derivatives),  and  hence  the 
state  of  the  aircraft.  However,  as  the  VTOL  aircraft  transi¬ 
tions  from  hovering  flight  to  aerodynamic  wing- supported 
flight,  the  aerodynamic  and  propulsion  forces  acting  on  the 
aircraft  interact  in  such  a  way  as  to  make  a  reasonable 
prediction  of  the  resulting  aircraft  stabi 1 ity- and-control 
derivatives  extremely  difficult.  Thus,  a  parameter  identifi¬ 
cation  problem  exists  during  this  transition  flight  regime 
which  must  be  solved  before  a  satisfactory  stability  augmenta¬ 
tion  system  can  be  designed. 

In  connection  with  the  current  interest  in  system 
identification  for  control  engineering  application,  two 
points  should  be  made.  First,  tremendous  advances  have  been 
made  in  computer  technology  during  the  past  few  years.  These 
advances  now  enable  the  system  designer  to  develop  computcr- 
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based  algorithms  to  solve  complicated  equations  which  in 
many  instances  were  not  computationally  practical  before. 
Second,  many  control  engineering  applications  are  such  that 
if  the  process  parameters  are  unknown,  they  can  often  be 
determined  experimentally  (Ref  13S). 

Out  1 ine 

In  Chapter  II,  an  overview  of  system  identification  is 
presented.  The  general  characteristics  of  the  identification 
problem  are  discussed,  including  its  assumptions  and  limita¬ 
tions.  A  convenient  method  for  classifying  various  identifi¬ 
cation  schemes  is  also  provided,  along  with  a  discussion  of 
those  factors  that  might  be  considered  in  selecting  an 
identification  method  for  a  particular  application. 

The  identification  of  linear  stochastic  systems  is  dis¬ 
cussed  in  Chapter  III.  Identification  methods  included  are 
maximum  likelihood,  instrumental  variables,  adaptive  estima¬ 
tion,  model  reference,  stochastic  approximation,  linear 
least-squares,  and  correlation  techniques.  The  approach  taken 
in  this  chapter  is  to  summarize,  from  the  sytem  identification 
literature,  a  few  of  the  outstanding  papers  on  each  of  the 
identification  methods  listed  above.  A  comprehensive 
bibliography,  keyed  by  identification  method,  is  included  to 
indicate  other  publications  not  specifically  referenced  in 
this  paper. 

Mehra's  on-line  identification  scheme  is  the  subject  of 
Chapter  IV.  The  development  presented  includes  identification 
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of  the  state  transition  matrix,  as  well  as  the  output  noise 
covariance.  A  canonical  form  representation  of  the  system 
is  also  developed  for  the  case  where  the  designer  is  concerned 
only  with  obtaining  a  system  model  that  yields  the  desired 
output,  and  not  interested  in  the  particular  structure  of 
the  model. 

Chapter  V  approaches  the  identification  problem  through 
the  use  of  adaptive  Kalman  filtering.  The  particular  system 
representation  that  is  discussed  is  Levy's  proper  canonical 
form,  which  is  derived  using  the  innovations  sequence  and 
certain  results  from  optimal  estimation  theory.  A  method 
for  determination  of  the  optimal  filtering  gains  is  also 
discussed.  Finally,  the  conclusions  and  recommendations  for 
further  study  are  presented  in  Chapter  VI. 
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H.  An  Overview  of  System  Identification 

In  this  chapter,  the  general  field  of  system  identifi¬ 
cation  is  examined  from  an  overview  perspective.  Without 
question,  such  an  endeavor  as  this  must  of  necessity  remain 
somewhat  incomplete.  This  is  not  surprising  when  one 
considers  the  rapid  development  presently  occurring  in  system 
identification,  as  witnessed  by  the  multitude  of  recent 
papers  published  on  the  subject.  For  a  more  comprehensive 
survey,  the  reader  is  referred  to  the  literature  (Refs  142 
and  156) . 

An  investigation  into  the  literature  of  system  identifi¬ 
cation  reveals  one  immediate  fact:  there  appears  to  be 
little,  if  any,  unification  of  the  field.  In  addition  to 
the  dozen-or-so  recognized  identification  methods  appearing 
in  the  literature,  there  are  a  sizeable  number  of  publications 
within  each  method.  The  result  is  that  the  non-expert  is 
often  left  somewhat  bewildered  and  confused.  Therefore,  the 
major  aim  of  this  chapter  is  to  provide  an  exposition  of  the 
underlying  rationale  necessary  for  improved  unification  and 
classification  of  the  field  of  system  identification. 

General  Characteristics  of  Identification  Prob 1 ems 

A  crucial  first  step  in  the  formulation  of  any  identifi¬ 
cation  problem  consists  of  answering  the  following  two 
questions : 

1.  What  is  the  system  being  considered7 
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2.  What  is  the  purpose  of  the  desired  identification? 

Is  the  system  under  consideration  a  portion  of  some  larger 
process,  the  entire  process  itself,  or  the  process  in  inter¬ 
action  with  its  environment?  Obviously,  answers  to  such 
questions  are  necessary  in  order  to  clearly  define  the 
system,  and  indicate  possible  limitations  in  the  resulting 
identification  analysis.  Similarly,  having  the  purpose  for 
which  the  identification  is  desired  clearly  in  mind  can  often 
suggest  possible  identification  methods  that  might  be  appro¬ 
priate  for  a  particular  application. 

On  the  question  of  the  purpose  of  the  desired  identifi¬ 
cation,  it  has  been  pointed  out  earlier  in  this  paper  that  a 
primary  motivation  appears  to  be  possible  engineering  control 
applications.  However,  many  situations  exist  where  the 
purpose  of  the  desired  identification  is  only  to  identify  the 
parameters  of  the  process.  These  two  motivations  can  be 
distinctly  different,  possibly  implying  the  application  of 
different  identification  methods.  For  example,  consider  as 
the  process  an  aircraft  in  steady  unaccelerated  flight.  If 
the  design  task  is  to  determine  whether  the  aircraft  has  the 
required  degree  of  static  stability  at  this  flight  condition, 
the  problem  becomes  one  of  parameter  identification  alone 
(i.e.,  determination  of  the  static  stabi 1 i ty- and-control 
derivatives).  In  this  case,  rather  precise  parameter  identi¬ 
fication  is  usually  required.  However,  if  the  design  task 
is  to  determine  whether  the  aircraft  has  the  required  degree 
of  maneuver  capability,  the  problem  becomes  one  of  parameter 
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identification  for  control  application.  In  this  case,  the 
parameter  identification  part  of  the  problem  is  generally 
not  as  critical  as  in  the  former  case,  because  the  aircraft's 
flight  control  system  can  usually  be  designed  so  as  to  be 
flexible  enough  to  cover  a  variety  of  "close"  models. 

The  preceding  example  illustrates  an  important  relation¬ 
ship  that  often  exists  between  the  identification  problem 
and  the  control  problem.  In  many  situations  where  parameter 
identification  is  only  an  intermediate  step  to  a  subsequent 
control  application,  the  assumption  is  tacitly  made  that 
the  problems  of  identification  and  control  can  be  separated. 
Generally  speaking,  a  control  system  designed  using  this 
assumption  will  not  be  optimum,  primarily  because  the  parameter 
identification  obtained  is  seldom  exact.  For  this  reason,  a 
significant  number  of  papers  have  appeared  recently  in  the 
literature  dealing  with  adaptive  estimation  and  model  refer¬ 
ence  techniques.  These  methods  attempt  to  deal  with  the 
problems  of  identification  and  control  more  or  less  simul¬ 
taneously,  and  are  taken  up  in  Chapter  III. 

Classification  o f  Identification  Methods 

Generally  speaking,  there  are  two  ways  in  which  the 
system  identification  problem  may  be  approached: 

1.  Theoretical  (Mathemat i ca 1  -  Physics )  Analysis 

2.  Experimental  Analysis. 

In  the  theoretical  approach,  one  typically  begins  by  con¬ 
sidering  the  physical  laws  that  govern  a  given  process,  and 
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attempts  to  formulate  a  mathematical  model  of  the  process. 

The  identification  problem  in  this  case  becomes  the  determina¬ 
tion  of  the  parameters  of  the  model  in  terms  of  the  physical 
data  of  the  process.  On  the  other  hand,  the  experimental 
approach  attempts  to  identify  the  parameters  of  the  process 
by  measuring  the  dynamic  behavior  (input-output)  of  the 
process.  The  identification  methods  discussed  in  Chapter  III 
are  chiefly  of  the  experimental  approach. 

A  useful  framework  for  considering  the  problems  of 
system  classification  and  identification  is  provided  by  Zadeh 
(Ref  185),  who  defines  system  classification  as  follows: 

"Given  a  black  box  B  and  a  family  of  classes  of 
systems  Cj,  C2>  ....  Ck,  ...,  C  such  that  B 

belongs  to  one  of  these  classes,  say  ,  the 

(classification)  problem  is  to  determine  by 

observing  the  responses  of  B  to  various  inputs." 

The  problem  of  system  identification  is  then  defined  as  a 
special  case  of  the  classification  problem  in  which  each  of 
the  classes  C^,  C2>  ....  Cn  has  just  one  member: 

■"Given  a  class  of  systems  C  with  each  member  of 
C  completely  characterized,  the  (identification) 
problem  is  to  determine  a  system  in  C  which  is 
equivalent  to  B." 

The  foregoing  definitions  imply  that  in  order  to  formulate 
the  identification  problem,  one  must  specify  a  class  of 
systems,  a  class  of  input  signals,  and  some  criterion  for 
defining  "equivalence"  between  model  and  system.  Each  of 
these  requirements  is  discussed  briefly  below. 
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A  variety  of  representations  may  be  used  to  characterize 
different  classes  of  mathematical  models,  e.g.,  linear,  non¬ 
linear,  continuous,  discrete,  etc.  The  form  of  model  selected 
can,  of  course,  significantly  affect  the  ensueing  identifica¬ 
tion  procedure.  For  the  identification  of  linear  stochastic 
systems,  models  are  often  classified  as  either  parametric 
or  non-parametric ,  depending  upon  whether  or  not  the  appro¬ 
priate  probability  distributions  are  known.  Most  of  the 
models  discussed  in  Chapter  III  represent  discrete  linear 
systems  with  some  form  of  additive  "noise"  present. 

Although  a  wide  range  of  input  signals  may  conceivably 
be  used  in  the  identification  of  process  parameters,  major 
simplifications  in  the  identification  procedure  can  often 
be  achieved  by  choosing  input  signals  of  a  special  type. 

This  situation  is  analogous  to  the  simplification  that  occurs 
in  the  identification  of  linear  deterministic  systems  when 
an  impulse  function  is  used  as  the  input  signal.  Aoki  and 
Staley  (Ref  83)  discuss  the  input  signal  synthesis  problem 
of  parameter  identification,  and  derive  conditions  necessary 
for  obtaining  a  class  of  asymptotically  unbiased  and  efficient 
parameter  estimates.  In  Chapter  III,  the  input  signal  used 
most  frequently  in  the  various  identification  methods  is  a 
gaussian  "white-noise"  sequence. 

Once  the  form  of  the  mathematical  model  and  input  signal 
have  been  selected,  the  remaining  requirement  in  formulating 
the  identification  problem  is  the  specification  of  some 
criterion  to  be  used  to  measure  the  equivalence  between  the 
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model  and  system.  The  selection  of  such  a  criterion  essen¬ 
tially  transforms  the  identification  problem  into  an  optimi¬ 
zation  problem.  For  many  of  the  identification  methods 
appearing  in  the  literature,  this  optimization  usually 
involves  the  minimization  of  some  function  of  an  "error 
signal".  The  error  signals  most  frequently  used  are  input 
errors,  output  errors,  and  (where  necessary  probability 
distributions  are  available)  errors  in  the  parameter  values. 

Selection  of  an  Identification  Method 

Before  moving  on  to  a  discussion  of  various  identifica¬ 
tion  methods,  it  might  be  appropriate  at  this  point  to  make 
a  few  comments  concerning  some  factors  that  might  be  con¬ 
sidered  in  selecting  an  identification  method  foT  a  particular 
application.  Certainly,  a  primary  consideration  concerns  the 
purpose  for  which  the  identification  is  desired,  as  discussed 
earlier  in  this  chapter.  In  most  practical  engineering 
control  applications,  seldom  does  the  designer  have  available 
a  priori  complete  knowledge  about  the  process  to  be  controlled 
and  its  environment.  However,  any  reasonable  steps  that  the 
designer  can  take  to  increase  his  knowledge  concerning  the 
system- -e . g .  ,  rough  calculations,  comparisons  with  similar 
systems,  use  of  engineering  experience  and  judgemcnt--can 
often  provide  valuable  insight  into  possible  identification 
methods  that  might  be  appropriate. 

Another  factor  that  can  sometimes  influence  the  choice 
of  a  particular  identification  procedure  concerns  the  required 
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accuracy  of  the  identification  analysis.  Regardless  of 
whether  the  ultimate  purpose  of  the  identification  analysis 
is  parameter  identification  or  control  application,  the 
degree  of  accuracy  required  is  an  important  factor  that  can 
often  dictate  the  type  of  identification  method  to  employ. 
However,  a  major  problem  exists  in  this  regard  in  deter¬ 
mining  the  degree  of  accuracy  required  of  the  identification 
analysis,  and  formulating  a  criterion  for  measuring  that 
accuracy.  Unfortunately,  the  literature  of  system  identfica 
tion  is  far  from  complete  in  the  area  of  the  accuracy  of 
identification  analyses. 
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III.  The  Identification  of  Linear 
Stochastic  Systems 

The  overview  of  system  identification  presented  in  the 
previous  chapter  is  used  as  the  basis  for  a  general  discus¬ 
sion  of  various  identification  methods  in  this  chapter. 

The  approach  selected  to  accomplish  this  objective  is  to 
summarize,  from  the  system  identification  literature,  one 
or  more  recent  papers  felt  to  be  illustrative  of  each  iden¬ 
tification  method  discussed.  In  order  to  keep  the  discussion 
in  this  chapter  as  generalized  as  possible,  most  of  the 
notational  details  peculiar  to  the  various  identification 
methods  have  been  umiiteu.  However,  many  of  the  definitions 
are  included  in  the  first  two  sections  of  Chapter  IV;  the 
others  may  be  found  by  consulting  the  indicated  references. 
This  approach  is  not  meant  to  be  totally  comprehensive,  but 
rather  merely  to  expose  the  reader  to  some  of  the  underlying 
concept s  attendant  to  the  different  identification  methods. 

An  endeavor  such  as  this  is  almost  certain  to  be  some¬ 
what  incomplete,  as  no  attempt  has  been  made  to  include 
every  identification  method  or  variation  appearing  in  the 
literature.  Furthermore,  it  is  conceivable  that  some  appli¬ 
cable  papers  may  have  been  inadvertently  overlooked.  It  is 
hoped,  however,  that  the  more  popular  methods  have  been 
included  for  discussion  in  this  chapter.  A  list  of  refer¬ 
ences  for  each  identification  method  is  contained  in  the 
bibl iography . 
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Maximum  Likelihood 

The  maximum  likelihood  identification  of  the  parameters 
of  a  discrete,  stationary  linear  stochastic  system  from 
noisy  input-output  measurements  is  discussed  by  Kashyap 
(Ref  4).  In  particular,  he  considers  the  problem  of  fitting 
linear  models  to  the  observed  output  data  of  a  physical 
process.  The  mathematical  model  used  to  describe  the  process 
consists  of  a  vector  input-output,  linear  nth  order  differ¬ 
ence  equation  having  constant  but  unknown  matrix  coefficients. 
The  system  is  assumed  to  be  driven  by  a  zero-mean  input  noise 
disturbance  which  is  completely  specified  by  a  set  of  n 
unknown  correlation  matrices.  The  output  measuremen Is  are 
assumed  to  be  contaminated  by  a  zero-mean,  uncorrelated 
measurement  noise  sequence  having  a  constant  but  unknown 
correlation  matrix.  The  input  and  output  noise  sequences 
are  assumed  to  be  mutually  independent,  and  the  identifica¬ 
tion  problem  consists  of  identifying,  from  the  input-output 
record  of  the  process,  optimal  estimates  for  the  various 
unknown  matrices.  The  approach  selected  by  Kashyap  is  to 
denote  the  parameters  of  the  unknown  matrices  by  a  parameter 
vector,  and  assume  that  the  conditional  probability  density 
function  of  the  output  measurement  (given  all  previous  output 
measurements  and  the  unknown  parameter  vector)  has  a  multi¬ 
variate  normal  distribution.  This  allows  the  likelihood 
function  to  be  derived,  and  then  maximized  with  respect  to 
the  unknown  parameter  vector.  The  optimal  estimates  are  then 
obtained  by  solving  a  resulting  parameter  optimization 
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problem.  Conditions  are  also  derived  under  which  the  maxi¬ 
mum  likelihood  estimates  (MLE)  are  unique,  asymptotically 
unbiased,  and  consistent. 

The  problem  of  determining  the  unknown  parameters  of  a 
dynamic  system  from  noisy  input-output  observations  is  also 
considered  by  Aoki  and  Yue  (Ref  1).  Their  paper  examines 
in  detail  the  estimation  errors  of  two  algorithms  that 
approximately  compute  the  maximum  likelihood  estimates  of 
the  system  parameters.  A  scalar  input -output ,  linear  nth 
order  difference  equation  with  constant  but  unknown  coeffi¬ 
cients  is  used  as  the  mathematical  model  of  the  system.  The 
input  and  output  observations  are  assumed  to  be  corrupted  by 
mutually  independent,  additive  white-noise  sequences  having 
a  standard  normal  distribution.  The  approach  taken  by  Aoki 
and  Yue  is  to  combine  the  unknown  coefficients  into  a 
parameter  vector,  and  show  that  the  approximate  MLE  of  the 
unknown  parameter  vector  is  equivalent  to  the  computation  of 
an  eigenvector  of  a  real  symmetric  matrix.  It  is  shown  that 
the  algorithms  for  the  approximate  MLE  yield  a  global  optimum 
for  the  unknown  parameter  vector,  whereas  the  algorithms  for 
the  true  MLE  yield  only  local  optimum  and  hence  require 
iteration  to  obtain  the  global  optimum.  Necessary  and  suffi¬ 
cient  conditions  are  derived  for  the  approximate  MLE  to 
converge  with  probability  one  to  its  true  value  as  the  sample 
size  tends  to  infinity.  For  finite  sample  sizes,  explicit 
bounds  are  derived  for  the  mean-square  error  of  the  approxi¬ 
mate  MLE. 
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The  maximum  likelihood  identification  of  stochastic 
linear  dynamic  systems  using  the  Kalman  filtering  representa¬ 
tion  is  discussed  by  Mehra  (Ref  5) .  The  mathematical  model 
used  is  a  state  variable  representation  of  a  discrete, 
single-input,  single-output  linear  dynamic  system  with 
constant  but  unknown  system  matrices.  The  input  and  output 
observations  are  assumed  to  be  corrupted  by  zero-mean, 
mutually  independent,  gaussian  white-noise  sequences  having 
constant  but  unknown  covariance  matrices.  The  approach 
followed  by  Mehra  is  to  transform  the  given  model  to  an 
equivalent  state  space  representation  called  "Levy's  proper 
canonical  form"  (this  representation  is  discussed  in  detail 
in  Chapter  V).  In  effect,  the  output  measurements  are 
"whitened"  through  use  of  a  causal  invertible  linear  trans¬ 
formation  (Kalman  filter).  The  MLE  of  the  unknown  system 
parameters  is  then  obtained  by  maximizing  the  conditional 
probability  density  function  of  the  output  measurement  (given 
all  previous  output  measurements  and  the  unknown  system 
parameters),  subject  to  the  equations  defined  by  Levy's 
proper  canonical  form.  Conditions  are  indicated  under  which 
the  MLE  are  unbiased,  consistent,  and  efficient. 

Instrumental  Variables 

Wong  and  Polak  (Ref  15)  consider  the  use  of  the  instru¬ 
mental  variable  method  to  estimate  the  parameters  of  discrete, 
linear  time-invariant  systems.  In  general,  the  method  is 
restricted  to  applications  in  which  the  control  led- input 
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(if  any)  is  noise-free,  while  the  observed  output  may  be 
corrupted  by  white-noise. 

Essentially,  the  instrumental  variable  method  estimates 
a  set  of  unknown  parameters  from  an  array  of  linear  algebraic 
equations  involving  these  parameters,  a  set  of  controlled- 
input  observations,  and  a  set  of  noise-corrupted  output 
observations.  The  output  measurement-noise  sequence  is 
assumed  to  be  a  sample  from  a  zero-mean  stationary  noise 
process  whose  covariance  function,  r(t),  tends  to  zero  at  a 
rate  faster  than  1/t  as  t  +  ®.  The  control  1 ed- input  sequence 
is  assumed  to  be  either  deterministic,  or  else  a  sample  from 
a  stationary  random  process  statistically  independent  of 
the  output  noise  process.  A  further  restriction  in  the 
method  is  that  the  number  of  equations  in  the  array  must  be 
greater  than  the  number  of  unknown  parameters  to  be  esti¬ 
mated.  Wong  and  Polak  show  that  by  first  premultiplying  the 
given  array  by  a  suitable  rectangular  matrix,  called  the 
instrumental  matrix,  a  square  invertible  array  is  obtained 
which  may  then  be  solved  for  the  required  parameter  estimates. 
The  elements  of  the  instrumental  matrix  are  called  instru¬ 
mental  variables. 

Since  the  instrumental  variable  method  cannot  be  used 
with  certain  inputs  (e.g.,  control  1 ed- input s  corrupted  by 
noise),  Wong  and  Polak  derive  a  necessary  and  sufficient 
condition  for  the  instrumental  matrix  to  exist  when  the  input 
is  deterministic  and  bounded.  They  also  show  that  under 
suitable  conditions  on  the  system  input  and  the  output 
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measurement-noise,  optimal  instrumental  variables  exist 
corresponding  to  two  criteria  of  optimality.  The  method, 
when  applicable,  is  always  shown  to  yield  consistent  esti¬ 
mates  of  the  parameters. 

Adaptive  Estimation 

In  most  physical  processes,  the  mathematical  model 
representation  of  the  process  can  usually  be  specified  only 
up  to  an  unknown  set  of  parameters.  If  X(t)  denotes  the 
state  vector  of  a  process  and  0  the  time-invariant  parameter 
vector,  then  the  problem  of  obtaining  the  optimal  (in  some 
sense)  estimate  of  the  state  vector  under  the  condition  of 
the  uncertainty  of  the  parameter  vector  is  often  referred 
to  as  adaptive  estimation.  In  essence,  it  is  the  simultan¬ 
eous  estimation  of  the  state  vector  and  identification  of  the 
process  parameters. 

The  problem  of  the  optimal  estimation  of  a  sampled, 
Gauss-Markov  stochastic  process  when  certain  parameters  of 
the  process  are  initially  unknown  is  discussed  by  Magill 
(Ref  26).  His  approach  is  to  assume  that  the  unknown  para¬ 
meters  belong  to  a  set  that  contains  a  finite  number  of 
possibilities  which  are  known  a  priori.  The  given  stochastic 
process  may  then  be  represented  by  a  set  of  "elemental 
stochastic  processes"  (one  for  each  possible  combination  of 
parameters),  a  switch  that  is  permanently  but  randomly 
connected  to  one  of  the  elemental  stochastic  processes,  and 
a  set  of  a  priori  probabilities  for  the  set  of  switch 
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positions.  The  elemental  stochastic  processes  are  repre¬ 
sented  as  the  outputs  of  linear  dynamic  systems  excited  by 
white  gaussian  noise. 

Magill  shows  that  the  optimal  estimate  is  one  that  is 
obtained  by  minimizing  a  generalized  mean-square-error 
criterion.  In  particular,  he  shows  that  the  optimal  state 
estimate  is  obtained  by  taking  the  complete  set  of  state 
estimates  conditioned  on  all  available  output  observations, 
weighting  each  with  the  conditional  probability  that  the 
appropriate  parameter  vector  is  true,  and  summing  over  the 
space  of  all  possible  parameter  values.  The  conditional 
weighting  coefficients  are  determined  by  an  application  of 
Bayes*  rule,  where  the  elemental  stochastic  processes  are 
assumed  to  have  a  multivariate  normal  density  function. 

Hillborn  and  Lainiotis  (Ref  22)  have  extended  the  work 
of  Magill  to  cover  non-Gaussian ,  Markov  processes  with 
unknown  parameters.  The  processes  are  characterized  as 
having  probability  distributions  of  known  functional  form, 
but  containing  a  set  of  unknown  parameters.  It  is  assumed 
that  all  initial  knowledge  of  the  unknown  parameters  can  be 
expressed  by  appropriate  probability  distributions  (Bayesian 
estimation),  with  the  result  that  the  optimal  state  estimate 
may  be  determined  without  going  through  the  intermediate 
step  of  parameter  identification  per  se.  For  sampled 
stochastic  processes  having  finite-state  unknown  parameters 
and  a  generalized  Markov  property,  Hillborn  and  Lainiotis 
show  that  the  optimal  state  estimates  can  be  formed  from  a 
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set  of  optimal  estimates  based  on  known  parameters,  and  a 
set  of  "learning"  statistics  which  are  updated  recursively. 
Necessary  and  sufficient  conditions  are  also  established  for 
the  convergence  or  "learning"  of  the  constant  unknown 
parameters . 

The  adaptive  estimation  techniques  discussed  in  the 
previous  two  papers  have  been  extended  by  Lainiotis  (Ref  24) 
to  cover  both  structural  and  parameter  adaptation,  where 
structural  adaptation  refers  to  the  unknown  dimensionality 
of  the  state  vector.  The  Bayesian  approach  to  the  adaptive 
estimation  problem  is  utilized  by  Lainiotis  in  assuming  that 
the  system  generating  the  random  processes  involved  is 
chosen  at  random  from  a  finite  collection  of  possible  systems 
These  systems  are  characterized  as  having  state  vector 
dimensionality  o,  parameter  vector  value  8a,  and  known  or 
assumed  a  priori  probability  Pr[a,0o].  It  is  further  assumed 
that  the  model  structure  a  is  less  than  some  fixed  number  n, 
and  the  defining  parameter  vector  0  is  time- invariant .  The 
assumption  of  an  upper  bound  n  to  the  system  dimensionality 
permits  structural  adaptation  to  be  imbedded  into  parameter 
adaptation.  This  is  accomplished  by  arbitrarily  choosing 
the  dimensionality  of  the  model  as  n,  and  determining  those 
system  parameters  that  are  zero  if  the  correct  model  struc¬ 
ture  is  less  than  n. 

Using  the  approach  of  augmenting  the  state  vector  with 
the  parameter  vector,  Lainiotis  shows  that  the  optimal  state 
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estimator  can  be  decomposed  into  two  parts:  a  linear  non- 
adaptive  part  consisting  of  a  bank  of  ordinary  Kalman-Bucy 
filters  matched  to  each  admissible  value  of  the  unknown 
parameter  vector,  and  a  nonlinear  part  consisting  of  likeli¬ 
hood  ratios  that  incorporate  the  adaptive  learning  nature 
of  the  estimator.  In  addition,  the  conditional-error  co- 
variance  matrix  is  also  derived  for  on-line  performance 
evaluation . 

Model  Reference 

Model  reference  or  model  tracking  techniques  are  ideally 
suited  to  applications  where  it  is  desired  to  obtain  the 
parameter  identification  result  recursively  as  the  process 
develops  (called  on-line  or  real-time  identification).  Such 
techniques  also  enjoy  wide  applicability  in  the  identifica¬ 
tion  of  processes  with  time-varying  parameters.  A  typical 
model  reference  diagram  is  shown  in  Fig.  3.  Use  of  the  model 


Fig.  3.  Identification  by  Model  Reference. 
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reference  technique  begins  by  formulating,  for  the  physical  , 

process  under  consideration,  a  mathematical  model  having 
adjustable  parameters.  The  system  input  signal  is  then 
simultaneously  fed  into  the  actual  process  and  its  mathe¬ 
matical  model,  after  which  the  output  of  each  is  fed  into 
a  model  adjustment  mechanism.  This  mechanism  operates  on 
the  adjustable  model  parameters  in  such  a  way  as  to  make 

the  model  output  follow,  as  closely  as  possible,  the  output 

?  J 

of  the  actual  process. 

I 

A  model  reference  technique  for  identifying  the 
parameters  of  dynamic  systems  modeled  by  differential 
equations  is  discussed  by  Hsia  and  Vimolvanich  (Ref  37). 

An  identification  algorithm,  based  on  the  learning  model 
concept,  is  derived  using  the  state-variable  formulation. 

A  multiple-input,  single-output  noise-free  system  is  first 
considered,  with  the  results  then  extended  to  the  case  where 
the  system  has  multiple-outputs  and  noise  present.  The 

i  i 

» 

|  system  under  consideration  is  characterized  by  an  nth  order 

|  vector  differential  equation,  and  a  corresponding  model  with 

•  unknown  but  adjustable  parameters  is  assumed.  Arbitrary 

| 

!  initial  values  of  the  model  parameters  are  assumed,  and  the 

i 

i  identification  objective  becomes  to  adjust  the  model  para- 

i 

!  meters  so  that  they  converge  toward  the  corresponding  para- 

:  i 

■  meter  values  of  the  actual  system.  The  parameter  adjustment 

!  I 

I  ■  procedure  adopted  by  Hsia  and  Vimolvanich  is  to  minimize  the 

[  identification  error,  i.e.,  the  difference  between  the 

f 

f  ’  ‘ 
y 

i 


24 


GSA/MA/72- 5 


system's  output  and  the  model's  output.  To  accomplish  this, 
equations  are  derived  for  the  adjustment  of  the  model  para¬ 
meters,  and  the  identification  error  is  shown  to  converge  in 
the  mean. 

Stochastic  Approximation 

Stochastic  approximation  is  an  identification  method 
that  can  be  used  when  the  system  input  is  assumed  to  be  a 
stationary  stochastic  process.  Although  some  variations  of 
the  method  may  be  used  without  prior  knowledge  of  the 
statistics  of  the  process,  most  require  knowledge  of  the 
noise  covariances  to  yield  consistent  estimates.  The 
parameter  estimates  obtained  using  stochastic  approximation 
methods  generally  have  larger  variances  than  those  obtained 
using  other  methods  (e.g.,  least-squares  estimates). 

Ho  and  Lee  (Ref  51)  consider  the  problem  of  deriving  a 
real-time-convergence  identification  scheme  for  linear 
dynamical  systems  using  stochastic  approximation.  A  linear 
discrete  model  in  state  variable  form  is  used  to  derive  an 
algorithm  for  determining  the  elements  of  the  state  transi¬ 
tion  matrix  (in  canonical  or  phase  variable  form).  The 
method  is  based  upon  the  following  assumptions:  the  system 
is  excited  by  a  zero-mean,  white-noise  process  having  known 
variance;  the  system  output  is  measured  exactly,  i.e.,  the 
output  measurement-noise  is  zero;  and  the  transfer  function 
between  the  input  noise  sequence  and  the  system  output  has 
no  numerator  dynamics.  The  resulting  algorithm  is  shown  to 
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yield  estimates  that,  although  not  optimal  in  a  stochastic 
sense,  nevertheless  represent  the  least-square  fit  to  the 
measurement  data.  The  parameter  estimates  obtained  using 
the  algorithm  are  shown  to  converge  to  their  true  values  in 
the  mean-square  sense. 

The  work  of  Ho  and  Lee  has  been  generalized  by  Saridis 
and  Stein  (Ref  58)  to  cover  the  on-line  identification  of 
forced,  discrete  linear  systems  from  a  sequence  of  white- 
noise-corrupted  output  measurements.  The  system  is  modeled 
in  canonical  or  phase  variable  form  using  the  state  variable 
representation.  The  Robbins -Monro  stochastic  approximation 
procedure  is  used  to  derive  an  identification  algorithm  that 
uses  only  measurements  of  the  system  input  and  output,  and 
does  not  require  knowledge  of  the  output  measurement  noise 
statistics.  However,  if  the  input  measurements  are  also 
corrupted  with  white-noise,  the  algorithm  requires  knowledge 
of  the  variance  of  this  noise  sequence.  The  algorithm  is 
also  shown  to  converge  to  the  true  value  of  the  parameters 
in  the  mean-square  sense. 

•A  slightly  different  approach  to  the  parameter  identi¬ 
fication  problem  using  stochastic  approximation  is  discussed 
by  Sakrison  (Ref  57).  he  considers  the  on-line  estimation 
of  the  poles  and  zeros  of  a  rational  transfer  function  whose 
order  is  known  not  to  exceed  some  fixed  number  n.  A  linear, 
time-invariant  system  model  is  assumed,  with  both  the  system 
input  and  output  observable  in  the  presence  of  mutually 
uncorrclated  ,  stationary  random  noise  processes  having 
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zero-mean  and  known  correlation  functions.  The  approach 
taken  by  Sakrison  is  to  use  stochastic  approximation  methods 
to  compute  on-line  an  optimum  filter,  from  which  the  desired 
coefficients  in  the  system  transfer  function  may  be  obtained. 
The  parameter  estimates  generated  using  the  derived  algo¬ 
rithm  are  shown  to  converge,  in  the  mean-square  sense,  to 
their  true  values. 

Linear  Least  Squares 

Steiglitz  and  McBride  (Ref  69)  discuss  an  iterative 
technique  to  identify  a  linear  system  from  samples  of  its 
input  and  output  in  the  presence  of  noise.  The  model  assumed 
is  a  linear  sampled-data  system,  with  input  and  output 
related  by  a  rational  2-transform.  The  approach  selected 
by  Steiglitz  and  McBride  is  to  minimize  the  mean-square 
error  between  the  model  output  and  the  observed  output  of 
the  plant.  A  technique  is  derived  for  carrying  out  this 
minimization  by  iteratively  performing  a  sequence  of  Kalman 
least-squares  linear  regressions  on  the  system's  input-output 
data.  Each  iteration  is  shown  to  be  computationally  equiva¬ 
lent  to  an  ordinary  Kalman  linear  regression,  except  for 
prefiltcring  of  the  input  and  output  data.  Experimental 
results  are  presented  to  show  that  the  iterative  identifica¬ 
tion  method  converges  more  slowly  than  the  ordinary  Kalman 
linear  regression  method,  but  results  in  improved  parameter 
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The  problem  of  identifying,  from  input-output  measure¬ 
ments,  the  unknown  parameters  in  linear  dynamical  systems 
with  transport  lags  is  considered  by  Hsia  (Ref  63).  An  nth 
order,  linear  di f f erenti al -dif f erence  equation  model  is 
assumed,  with  noise-free  input  and  output.  The  identifica¬ 
tion  technique  derived  by  Hsia  essentially  involves  two 
steps:  (1)  the  use  of  a  finite  difference  technique  to 

reduce  the  differential-difference  equation  to  an  ordinary 
difference  equation,  and  (2)  estimation  of  the  system 
parameters  through  identification  of  the  resulting  discrete 
model  via  Kalman's  least-square  method. 

Correlation 

Anderson,  et.  al.  (Ref  72)  discuss  the  problem  of 
determining  consistent  estimates  of  the  parameters  of  a 
linear  dynamic  system.  A  discrete  linear  model  in  the  state 
variable  formulation  is  assumed,  with  uncorrelated,  additive 
noise  present  at  both  the  input  and  output.  The  state 
transition  matrix  and  input-output  noise  covariance  matrices 
are  assumed  to  be  unknown,  and  thus  require  identification. 
Estimates  of  these  matrices  are  derived  and  shown  to  be 
strongly  consistent  when  the  linear  system  is  stable,  i.e., 
when  the  eigenvalues  of  the  state  transition  matrix  lie  within 
the  unit  circle.  In  addition,  the  asymptotic  properties  of 
the  model  are  investigated,  and  shown  to  remain  unchanged 
when  the  unknown  system  parameters  are  replaced  by  their 
strongly  consistent  estimates. 
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The  problem  of  estimating  the  autoregressive  parameters 
of  a  mixed  moving-average  time  series  of  known  order  using 
output  data  alone  is  considered  by  Gersch  (Ref  75).  The 
problem  is  shown  to  be  equivalent  to  the  estimation  of  the 
denominator  terms  of  a  scalar  transfer  function  of  a 
stationary,  linear  discrete-time  system.  Three  formulations 
of  the  system  are  discussed:  state  variable,  time  series, 
and  Z-transform  representations.  It  is  assumed  that  the 
system  is  excited  by  a  zero-mean,  uncorrelated  input  noise 
sequence  of  unknown  variance,  and  the  output  observations 
are  exact  (i.e.,  the  output  measurement-noise  is  zero).  The 
approach  followed  by  Gersch  is  to  derive  a  modified  set  of 
Yule-Walker  equations,  which  are  then  used  to  solve  for  an 
asymptotically  unbiased  estimator  of  the  unknown  autoregres¬ 
sive  parameters.  The  estimator  is  also  shown  to  be  unbiased 
in  the  presence  of  additive  white  output  measurement-noise 
of  arbitrary  finite  correlation  time. 

In  the  next  chapter,  the  on-line  identification  of 
discrete  linear  stochastic  systems  is  considered.  The 
identification  methods  discussed  are  based  largely  upon  the 
statistical  correlation  properties  of  the  output  measurements. 
As  such,  the  methods  are  similar  to  those  employed  in  time 
series  analysis  (Refs  75,  80,  and  82). 
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IV.  On-Line  Identification  of  Pi screte 
Linear  Stochastic  Systems 

In  this  chapter,  attention  is  turned  to  an  in-depth 
examination  of  an  identification  method  recently  proposed 
by  Mehra  (Refs.  78  and  79).  The  method  is  more  general  than 
some  others  currently  appearing  in  the  literature  (e.g., 
instrumental  variables  and  stochastic  approximation)  ,  in 
that  complete  knowledge  of  the  input-output  noise  covariance 
matrices  is  not  required.  Additionally,  the  method  is 
capable  of  providing  on-line  identification  when  implemented 
using  a  digital  computer. 

In  the  interest  of  notational  simplicity  and  continuity, 
only  the  discrete  case  will  be  discussed  in  the  remainder  of 
this  paper.  This  should  cause  no  serious  limitations,  since 
the  method  can  be  extended  to  the  continuous  case  with 
little  difficulty. 

Preliminaries 

For  the  convenience  of  the  reader,  a  few  preliminary 
definitions  are  introduced  in  this  section.  The  terminology 
and  notation  appearing  in  the  remainder  of  this  paper  are 
fairly  standard,  being  used  rather  extensively  in  the 
literature  of  system  identification  and  estimation  (Ref 
180:106-121)  . 

Definition  1_:  A  stochastic  process  is  a  family  of 
random  vectors  (X(k),  kt  I }  indexed  by  a  parameter  k  all  of 
whose  values  lie  in  some  appropriate  index  set  I. 
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Definition  2_:  Let  X  be  an  n  vector.  Then  the  stochas¬ 
tic  process  {X(k),  kel}  is  said  to  be  independent  if,  for 
any  m  time  points  k^,  ....  km  in  I,  where  m  is  any  integer, 
the  joint  probability  distribution  function  of  the  m  random 

n  vectors  X(k,),  ...,  X(k  )  is  equal  to  the  product  of  the 
-  l  ~  m 

probability  distribution  functions  of  the  m  individual  n 
vectors.  That  is 

m 

P [X (k. )  <  x,,  ...,  X(k  )  <  x  ]  =  n  P . [ X (k . )  <  x.] 

i  =  l 

for  all  n  vectors  x, ,  ....  x  . 

- 1  -m 

Definition  3:  Let  { X (k )  ,  kel}  and  (Y(k),  kel)  be  two 

■■  — ■  i-  —  —  ^ 

stochastic  processes,  where  X  is  an  n  vector  and  Y  is  a  p 

*N» 

vector.  The  two  stochastic  processes  are  said  to  be 
independent  of  each  other  if,  for  any  m  time  points 
kj,  ....  km  in  I,  where  m  is  any  integer,  the  joint  proba¬ 
bility  distribution  function  of  tha  2m  random  vectors 

X(kj) . Xfk^),  Y(kj),  ...»  Ytk^)  is  equal  to  the  product 

of  joint  probability  distribution  functions  of  the  m  random 
n  vectors  and  m  random  p  vectors.  That  is, 

P [X (k  )  <  x  ,  ....  X (k  )  <  x  ,  Y(k  )  <  y  ,  ....  Y(k  )  <  y  ] 

~  i  -  -  m  -  - m  ~  i  -  ~l  ~  m  -  ~ m 

=  PjlXCkj)  <  xr  ....  X(km)  <  xm|P2[Y(k1)  <  y  j  ,  ...,Y(km)  <  yj 

for  all  n  vectors  x, ,  ...,  x  and  all  p  vectors  y,  ,  ...,  y  . 

-1’  ~m  '  ~l  ~m 

Definiti oji  £ :  A  stochastic  process  { X  ( k )  ,  kel)  is  said 
to  be  uncorrelated  if,  for  all  j,  ktl,  j  t  k 
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E[X(j)X'(k)]  =  E[X(j)]E[X'(k)] 

Definition  5:  Two  stochastic  processes  {X(k),  kel} 

1  "  '  *s» 

and  {Y(k),  kel}  are  said  to  be  uncorrelated  if,  for  all 

j,  kel 

E[X(j)Y'(k)]  =  E[X(j)]E[Y*  ( k ) ] 

Definition  6^:  A  stochastic  process  is  said  to  be 
stationary  if  the  probability  laws  governing  the  mechanism 
producing  the  process  remain  time- invariant  as  the  process 
evolves  in  time. 

Definition  1_:  A  stochastic  process  (X(k),  kel}  is  said 
to  be  gaussian  or  normal  if,  for  any  m  time  points  kj,  ....  k^ 
in  I,  where  m  is  any  integer,  the  set  of  m  random  n  vectors 
X(k^),  ....  x(km)  is  jointly  gaussian  distributed. 

Definition  A  stochastic  process  {X(k),  kel}  is  said 

to  be  a  gaussian  white  process  if,  for  any  m  time  points 
kj,  .  ..,  k^  in  I,  where  m  is  any  integer,  the  m  random  n 
vectors  X(k^),  ....  Xfk^)  are  independent  gaussian  random 
vectors . 

Definition  9_:  A  stochastic  process  {X(k),  kel}  is  said 
to  be  a  Markov  process  if,  for  any  m  time  points 

k,  <  k_  <  ...  <  k  in  I,  where  m  is  any  integer,  the  condi- 

i  i  m 

tion  probability  distribution  function  of  x(km)  f°r  Riven 
values  of  X(kj),  ...,  ^as  the  property  that 
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1 


5  -  ?! . 5<k„.i)  *  S».i> 

■  p‘k‘k»)  :  ;J*<km-i)  *  Jm-ii 

for  all  n  vectors  x,  ,  ....  x  . 

- 1  ~m 

Definition  1 0 :  A  stochastic  process  is  said  to  be 
Gauss-Markov  if  and  only  if  it  is  both  Gaussian  and  Markov, 


A 


The  Model 

The  system  model  proposed  by  Mehra  has  been  discussed 
many  times  in  the  system  estimation  and  identification 
literature,  e.g.,  Kalman  (Ref  100),  Ho  and  Lee  (Ref  51), 
and  Meditch  (Ref  180).  Essentially,  the  model  represents  a 
discrete  linear  dynamic  system  operating  in  a  stochastic 
environment,  and  is  formulated  using  the  state  variable 
approach.  A  block  diagram  representation  of  the  model  is 
shown  in  Fig.  4. 


Fig.  4.  Discrete  Linear  Stochastic  Model. 
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The  system  dynamics  and  output  measurement  dynamics  are 
governed,  respectively,  by  the  following  two  linear  differ¬ 
ence  equations: 


X (k+ 1 )  =  4>X(k)  ♦  TU(k)  (1) 

•*  «v 

Z  (k)  =  HX  (k)  ♦  V  (k)  (2) 

for  k  *  0,  1,  ....  where 

X  is  the  n  x  1  state  vector 

♦  is  the  n  x  n  state  transition  matrix  (constant) 

U  is  a  p  x  1  vector  of  gaussian  white-noise  called  the 
system  disturbance  vector 

T  is  an  l,  x  p  matrix  called  the  disturbance  transition 
matrix  (constant) 

Z  is  a  r  x  1  vector  called  the  output  measurement 
vector 

H  is  a  v  x  n  matrix  called  the  measurement  matrix 
(constant ) 

V  is  a  r  x  1  vector  of  gaussian  white-noise  called  the 
measurement  error  vector 

Equation  (1)  is  often  called  the  state  equation ,  and  Eq  (2) 

the  measurement  equation . 

Assumpt ions 

1.  E [U  (k)  )  =  0  (3) 

E [U ( j )U  * (k) ]  =  Q5jk  (4) 
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E [ y (k)  ]  =  0 

(5) 

E[V(j)V’  (k)] 

■*»  > 

=  R<5  ■  . 

-  3k 

(6) 

E [U ( j )V '  (k)  ] 

=  0 

(7) 

E[X(0)]  =  0 

(8) 

E [X (0) X ' (0)] 

=  P(0) 

(9) 

for  j,  k  =  0,  1,  ....  where 

Q  is  a  p  x  p  positive  definite  matrix  called  the 
input -noi se  covariance  matrix  (constant) 

R  is  a  r  x  r  positive  definite  matrix  called  the 
output-noise  covariance  matrix  (constant) 

X (0)  is  a  zero-mean  guassian  n  vector  called  the  initial 
state  vector 

P(0)  is  an  n  x  n  matrix  called  the  initial  state 
covariance  matrix 

6.,  is  the  Kronecker  delta 
3* 

2.  The  system  is  completely  observable  and  controllable. 
Kalman  (Ref  100)  shows  that  this  is  equivalent  to  the 


following 

two 

conditions  : 

rank 

[r. 

♦r.  ....  =  n 

(10) 

rank 

[«' 

,  4>'H'  ,  ....  ($n_1)  'll'  ]  =  n 

(ID 

This  condition,  along  with  the  positive  definiteness  of 
Q  and  R,  is  necessary  to  ensure  the  asymptotic  global 
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stability  of  the  Kalman  filter,  as  shown  by  Deyst  and  Price 
(Ref  95) . 

3.  The  state  transition  matrix,  <t>,  is  nonsingular, 
with  all  eigenvalues  located  inside  the  unit  circle. 

DeRusso,  et.  al.  (Ref  177:447)  show  that  this  condition  is 
necessary  to  ensure  stable  dynamics  for  a  time- invariant 
linear  system. 

4.  The  system  is  minimum  phase,  i.e.,  the  system 
transfer  function  has  no  zeros  located  in  the  right-half 
plane.  Truxal  (Ref  182:426-427)  shows  that  this  condition 
is  necessary  to  ensure  the  physical  realizability  of  a 
linear  system. 

5.  The  system  is  time-invariant,  and  the  identifica¬ 
tion  procedure  begins  after  steady  state  conditions  have 
been  reached.  This  assumption  is  necessary  to  allow  the 
system  matrices  $ ,  T,  and  H,  the  noise  covariance  matrices 
Q  and  R,  and  the  covariance  matrix  of  the  state  vector  X(*) 
to  be  treated  as  constants. 

6.  The  problem  of  interest  is  to  identify  the  unknown 

matrices  4>,  T,  H,  R,  and  Q  from  a  record  of  the  output 
measurements  Z(0),  Z(l) . 

Propcrt i es 

1.  Meditch  (Ref  180:168-169)  shows  that  the  model 
described  by  Eqs  (1)  through  (9)  has  the  following  properties 
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a.  The  stochastic  processes  (X(k),  k  =  0,  1,  ...} 
and  (Z(i),  i  =  0,  1,  ...,  j}  are  Gauss-Markov 
sequences  with  identically  zero  means. 

b.  E [X (j )U  *  (k) ]  =  0 

V  k  >  j.  j  =  0,  1,  ...  (12) 

c.  EtZ(j)U'(k)]  =  0 

V  k  >  j,  j  =0,  1,  ...  (13) 

d.  E [X ( j )V '  (k) ]  =  0 

V  j  and  k  (14) 

e.  E[Z(j)V'(k)]  =  0 

V  k  >  j,  j  =  0,  1,  ...  (15) 

These  properties  follow  from  the  linearity  inherent  in  the 
model,  and  the  fact  that  X(k)  depends  only  upon  X(0), 

U(0),  ...,  U(k-l),  where  (U(j),  j  =  0,  1,  ...}  and 
{V(j),  j  ■  0 ,  1 ,  ...}  are  uncorrelated  zero-mean  white 
gaussian  sequences  independent  of  the  gaussian  random 
vector  X (0) . 

2.  At  this  point,  it  is  convenient  to  state  some 
general  properties  from  optimal  estimation  theory  (Refs  100 
and  180) . 

a.  The  optima  1  estimate  of  the  state  X(k)  given 
the  output  measurements  Z(s),  0  <  s  <  j,  is 
given  by  the  conditional  mean  of  X(k),  denoted 
as  follows: 
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X(k|j)  =  E [X (k) | Z  (s)  ,  0  <  s  <  j]  (16) 

If  k  >  j,  the  estimation  problem  is  one  of 
prediction ;  if  k  *  j,  the  estimation  problem 
is  one  of  f i ltering ;  and  if  k  <  j ,  the  problem 
is  one  of  smoothing .  The  filtered  and  predicted 
state  estimates  are  used  in  the  adaptive  Kalman 
filtering  approach  to  the  identification 
problem  discussed  in  Chapter  V. 

b.  Let  the  estimat ion  error  of  the  state  X(k)  be 
defined  by 

X(k|j)  =  X(k)  -  X(k|j)  (17) 

Then  the  single-stage  optimal  predicted 
estimate  of  X(k+1)  is  given  by  the  following 
relation : 

A  A 

X (k+ 1 | k )  =  $X  (k  |  k)  (18) 

A 

where  X ( k | k )  is  the  optimal  f i ltered  estimate 
of  X  (k)  . 

A 

c.  X(k|j)  is  a  linear  estimate,  i.e.,  a  linear 
combination  of  the  available  output  measure¬ 
ments  Z  (s)  ,  0  <  s  <  j. 

A 

d .  X  (k | j )  is  unique  . 
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e.  X  (k | j  )  and  X(k|j)  are  gaussian  random  n 
vectors . 

f.  The  stochastic  process  (X(k+l|k),  k  =  0,  1,  ...} 
is  a  zero-mean  Gauss -Markov  sequence. 

g.  X(k|j)  is  independent  of  any  linear  combina¬ 
tion  of  the  available  output  measurements.  In 
particular,  X(k|j)  is  independent  of  X(s|t), 
which  implies  that 

E I X (k | j ) X ' ( s | t )  ]  «  0  (19) 

for  all  k ,  j ,  s ,  and  t . 

h.  If  the  r  x  1  vector  Y(k)  is  defined  by 

Y (k)  =  HX(k) 

so  that  Y(k|k-1)  =  HX(kjk-l),  then  Z(j)  and 
Y (k | k- 1 )  are  independent  random  vectors, 
implying  that 

E[Z(j)Y' (k|k-l)]  =  0  (20) 

for  all  k  and  j  . 

Identification  of  4> 

The  identification  method  proposed  by  Mehra  (Ref  78) 
to  estimate  the  elements  of  the  state  transition  matrix  4> 
is  based  on  the  autocorrelation  function  of  the  output 
measurements  Z(i).  In  order  to  formally  derive  the  method, 
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however,  it  is  first  necessary  to  establish  that  under 
steady  state  conditions  the  stochastic  process 
{Z(i),  i  =  0,  1,  ...}  is  a  stationary  gaussian  sequence. 
Using  Eqs  (2),  (6),  and  (14),  the  covariance  matrix  of  the 
output  measurements  may  be  written  as  follows  for  k  >  0: 


E [Z  (i ) Z  '  (i-k)J  =  E{[HX(i)*V(i)] [HX(i-k)  +  V(i-k)]  ’} 


=  HE[X(i)X' (i-k) ]H'+HE[X(i)V' (i-k) ] 


♦  E [ V ( i ) X ' (i-k)]H'+E[V(i)V' (i-k)] 


=  HE [ X ( i ) X  '  (i  -  k)  ]  11 1 


Rewriting  Eq  (1)  in  terms  of  the  initial  state  vector  X(0) 


results  in 


i  -  1  .  . 

X  ( i )  =  <t>1X(0)  ♦  l  4- 1_1’s  ru(s)  (22) 

s  =  0 

i  v  i -k - 1  -vi 

X(i-k)  =  X(0)  +  l  4>1'K'1_s  TU(s)  (23) 

s  =  0  " 


Solving  Eq  (23)  for  X ( 0 )  and  substituting  the  result  into 


(22)  yields 


.  .  i-k-1 

X  ( i )  =  4>1(4>K1X(i-k)  -  l  <i>‘  1  'S  TU(s)] 
"  ”  "  s=0  ~ 


♦  i  4* 1  ” 1 " s  ru(s) 
s  =  0  ~ 


4>X(i-k)  ♦  l  4> 
s  =  i  -  k 


i  -  1  -  s 


ru  (s) 
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Substituting  Eq  (24)  into  (21),  and  using  Eq  (12)  yields 

v  i-i 

E  [  Z  ( i )  Z  '  ( i  -  k )  ]  =  HE{ [$KX(i-k)  ♦  £  «>  "S  TU(s)]  • 

s  =  i  -  k 

•  [X 1 (i-k) ] } H  * 

=  H$kE[X(i-k)X» (i-k) )  H  • 
i-i  .  . 

♦  H  l  S’1"  s  rE[U(s)X' (i-k)jH' 
s  =  i  -  k 

■  H*kE[X(i-k)X' (i-k)]H'  (25) 

If  the  n  x  n  covariance  matrix  of  the  state  vector  X(i)  is 
defined  by 


P(i)  =  E[X(i)X'(i)]  (26) 

mrn  mm  mrn 

then  the  following  expression  may  be  obtained  for  Eq  (25): 

E [Z (i)Z •  (i-k) ]  =  H$kP (i-k)H'  (27) 

By  assumption  five  of  the  model,  however,  steady  state 
conditions  have  been  reached.  This  implies  that  the  co- 
variance  matrix  of  the  state  vector  X(>)  is  constant  and 
may  be  written  as 


P  -  P(-)  (28) 

Substituting  Eq  (28)  into  (27)  results  in  the  following 
relation  for  the  covariance  matrix  of  the  output  measure¬ 


ments  : 
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E  [Z  (  i  )  Z  '  (i-k)J  =  H4>kPH',  k  >  0  (29) 

A  side  condition  on  P  may  be  obtained  by  substituting 
Eq  (1)  into  (26),  and  using  Eqs  (4),  (12),  and  (28): 

P (i)  =  E{[4X(i-l)  +  ru(i-l)] ($X(i-l)  +  ru(i-l)]  '} 

*  4>E[X(i-l)X'  (i-1)  ]4>  '  +  rQr' 

*  4>P(i-l)4>' +rQr’ 

or 

p  =  4>p$'+rQr*  (30) 

Similarly,  using  Eqs  (2),  (6),  (14),  and  (28),  the 
autocorrelation  function  of  Z(i)  may  be  written  as  follows 
for  the  case  of  k  =  0: 

E [Z  (i) Z  •  (i) ]  =  E{[HX(i)+V(i)][HX(i)+V(i)]'} 

=  HE [ X (i ) X ' (i) ] H ' +R 

=  HPH'+R,  k  =  0  (31 ) 

Therefore,  it  can  be  seen  from  Eqs  (29)  and  (31)  that  under 
steady  state  conditions,  E [ Z (i  )  Z  '  ( i - k)  ]  is  independent  of  i 
for  all  k.  This  implies  that  {Z(i),  i  =  0,  1,  ...)  is  a 
stationary  sequence,  which  is  also  gaussian  by  property  one 
of  the  model. 

It  should  be  noted  that  in  general,  the  autocorrelation 
function  of  the  output  measurements  is  not  a  scalar  but 
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rather  a  r  x  r  matrix.  However,  if  only  scalar  outputs  are 
considered  (i.e.,  r  *  1),  the  corresponding  autocorrelation 
function  is  also  a  scalar.  In  the  remainder  of  this  paper, 
therefore,  only  the  scalar-output  case  will  be  considered, 
and  the  scalar  autocorrelation  function  of  the  output 
measurements  is  given  by 


C 


k 


HPH '  ♦  R, 
H4>kPH ' 


k  =  0 
k  >  0 


(32) 


Having  established  the  stationarity  of  the  above 
vector-input  scalar-output  system,  attention  is  returned  to 
a  consideration  of  Mehra’s  proposed  on-line  identification 
method  (Ref  78).  The  autocorrelation  function  of  the  output 
measurements,  C^,  is  used  to  derive  a  set  of  equations 
analogous  to  the  Yule-Walker  equations  in  the  statistical 
analysis  of  purely  autoregressive  time  series  (Ref  183). 

The  derived  set  of  equations  are  then  shown  to  form  the 
basis  for  the  proposed  on-line  identification  method.  Using 
Eq  (32)  for  k  =  j,  j  +  1,  ...»  j  +  n  -  1,  where  n  is  the 
order  of  the  system  and  j  >  1,  results  in 
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where 


B.  = 
-3 


H4>^ 

H 

"*• 

H4>^  +  1 

H4> 

««. 

• 

• 

• 

• 

• 

• 

• 

Ht"-1 

(34) 


It  should  be  observed  that  the  n  x  n  matrix  B.  is  nons inrul ar 

-3 

This  follows  from  assumptions  three  and  two  of  the  model 
which  state,  respectively,  that  the  state  transition  matrix 
♦  is  nonsingular  and  the  n  x  n  observability  matrix  is  of 
full  rank ,  i . e . , 


ran* 


H 

H4> 


H4> 


n  - 1 


=  n 


Therefore,  since  B^  from  Eq  (34)  is  seen  to  be  the  product 
of  the  observability  matrix  having  rank  n  and  the  nonsingular 
matrix  ,  it  follows  that  the  rank  of  B^  is  also  n.  Thus 
the  rows  of  are  linearly  independent,  and  the  nonsingu¬ 
larity  of  bj  is  es  t  at)  l  i  sned  .  solving  for  PH'  from  Eq  (33) 
yields 
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PH*  «  BT1 
~3 


Vi 


C j  +n- 1 


(35) 


Substituting  Eq  (35)  into  (32)  for  k  =  j  ♦  n  results  in 


C.  «  H4>^  +nBT  1 
3 +n  ~~  -3 


(36) 


j  ♦n-  1 


Equation  (36)  represents  a  recursive  relationship  for 
the  autocorrelation  function  C^.  In  its  present  form, 

Eq  (36)  cannot  be  used  in  the  identification  of  <t>  because 
of  the  presence  of  the  unknown  matrix  H.  However,  H  can 
be  eliminated  by  use  of  the  Caley-Harai lton  Theorem  (Ref  177), 
which  states  that  the  square  matrix  4>  satisfies  its  own 
characteristic  equation.  If  the  characteristic  polynomial 
of  is  denoted 


f(A)  *  |  A I  -  |  ■  An  ♦  anAn**  ♦  ...  ♦  a£A  ♦  a^ 


where  A  is  a  root  of  the  characteristic  equation  of  $  and  the 
a^,  1*1,  ...,  n  are  unknown  scalars,  then  this  implies  that 


An  .  n- 1 

v  ♦  a  ♦  ♦ 

-  n- 


♦  ♦  a.4> 

1  ** 


(37) 


45 


GSA/MA/ 72-5 


Premultiplying  Eq  (37)  by  H4>^  and  solving  the  resulting 
equation  for  H4»^+n  yields 


=  - [&i ,  a2 ,  ....  an] (38) 
Postmultiplying  Eq  (38)  by  B.1  results  in 

*  -tai'  a2 »  •»  anl  (39) 

Substituting  Eq  (39)  into  (36),  the  following  result  is 
obtained : 


Equation  (40)  represents  a  modified  set  of  Yule- Walker 
equations  (Ref  183).  These  equations  may  be  written  in 
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matrix  notation  by  allowing  j  to  run  from  1  to  n  in 
Eq  (40): 


"Cn+1 

~C1 

C  2 

• 

• 

• 

n 

3 

_ 1 

al 

Cn-2 

C2 

C  3 

...  C  . 
n*l 

a2 

• 

• 

s  •» 

• 

• 

• 

• 

1 - 

n 

KJ 

3 

1 _ 

C 

n 

Cn*l 

• 

*•*  C2n- 1 

• 

a 

n 

(41) 


The  development  leading  to  the  result  in  Eq  (41)  is  similar 
to  the  approach  used  by  Gersch  in  the  analysis  of  a  mixed 
autoregressive  moving-average  time  series  (Ref  75)  .  In  his 
work,  Gersch  derives  an  unbiased  estimator  for  the  auto¬ 
regressive  time  series  parameters.  He  refers  to  Eq  (41)  as 
the  "normal  equation"  of  the  system,  and  shows  that  for 
j  =  1,  ....  n,  the  n  x  n  matrix  of  autocorrelations  appearing 
therein  is  nonsingular.  Hence,  Eq  (41)  may  be  rewritten  as 


"al‘ 

"C1 

C2 

.  .  .  C 

n 

-1 

"Cn+1 

a2 

C  2 

C  3 

...  C  . 
n+1 

Cn  +  2 

• 

• 

25  • 

• 

• 

• 

• 

• 

a 

n 

_  _ 

C 

n 

Cn+ 1 

• 

*'•  C2n- 1 

_C2n_ 

(42) 


In  order  to  use  Eq  (42)  to  estimate  the  a^,  and  hence 
♦,  it  is  necessary  to  first  estimate  the  autocorrelation 
functions  C^.  If  is  used  to  denote  an  estimate  of  Cj,,  one 
possibility  is  to  make  use  of  the  fact  that  the  2 ( i )  arc  a 
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stationary  gaussian  sequence,  and  substitute  the  empirical 
autocorrelation  functions  of  the  output  measurements  for  C^: 


C  =  1/T  l  Z(i)Z(i-k) 
K  i  =  k 


(43) 


where  T  is  the  number  of  sample  points  of  the  output.  For 
finite  sample  sizes,  the  estimates  of  obtained  using 
Eq  (43)  are  biased,  requiring  that  T  in  the  denominator  be 
replaced  by  T-k  for  unbiased  finite-sample  estimates. 
However,  Heffes  (Ref  97)  shows  that  the  estimates  obtained 
using  Eq  (43)  result  in  less  mean-square  error  than  the 
corresponding  unbiased  estimates,  and  hence  are  preferable. 
Therefore,  substituting  the  values  of  given  by  Eq  (43) 
into  Eq  (42),  and  letting  a.^  denote  an  estimate  of  a i,  the 
following  result  is  obtained: 


A 

ai 

H 

<u 

1 _ 

/V 

C2 

A 

...  c 

n 

-1 

I 

n> 

3 

+ 

*-• 

_ 1 

a2 

C2 

C3 

A 

...  c  . 

n+ 1 

A 

Cn  +  2 

• 

“  ™ 

• 

• 

1 - 

M  >  •  - 

3 

1 _ 

A 

c 

n 

A 

Cn+ 1 

'•*  ^2n- 1 

• 

A 

_C2n  _ 

(44) 


Mehra  shows  that  the  identification  method  represented 
by  Eq  (44)  has  the  following  properties: 

1.  The  a^  are  asymptotically  unbiased,  normal,  and 
consistent  estimates  of  the  a^.  This  follows  from  the  fact 
that  for  large  T,  the  errors  in  estimating  the  a^  are  approx 
imatcly  linearly  related  to  the  errors  in  estimating  the  C^. 
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Parzen  (Ref  80)  shows  that  is  an  asymptotically  unbiased, 
normal,  and  consistent  estimator  of  Ck-  Since  the  errors  in 
estimating  the  are  asymptotically  normal  with  zero  mean, 
the  stated  property  is  established. 

2.  The  identification  procedure  is  capable  of  being 
implemented  on-line,  i.e.,  the  a^  can  be  calculated 
resursively.  If  C**1  is  used  to  denote  the  estimate  of 
based  on  T  ♦  1  sample  points,  it  follows  that 

T+ 1 

-  fTT  I  2Ci)Z(i-k) 

.  ifij.  (Z(T.l)Z(T.l-k)]  ♦  ^  cj  (45) 

The  au  can  then  be  calculated  recursively  using  Eq  (44)  for 
T  4  1  observations: 


~  ~T+  1 
al 

"  «T  ♦  1 
L1 

pT+  1 

pT+ 1 

•  •  ■  v _ 

n 

-1 

pT+l“ 

n+1 

-'T+l 

a2 

pT  ♦  1 
U2 

pT+ 1 

3 

pT+l 
*  *  *  un*l 

pT  ♦  1 
n  +  2 

« 

• 

=  - 

• 

• 

• 

• 

• 

• 

(46) 

~T+1 

a 

n 

cT+1 

n 

• 

pT+  1 
Ln*l 

pT+  1 

•  *  *  u2n-l 

• 

pT+ 1 
2n 

3.  The  identification  method,  while  identifying  the 
coefficients  ai  in  the  characteristic  polynomial  of  4> ,  does 
not  actually  identify  the  elements  of  the  $  matrix  itself. 
This  can  be  a  serious  disadvantage  if  the  exact  structure 
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i 

i 

i 


of  ♦  is  required.  The  reason  for  this  is  that  once  the  a^ 
are  known,  the  Cayley-Hamilton  theorem,  viz.,  Eq  (37),  can 
theoretically  be  used  to  solve  for  the  actual  elements  of 
the  $  matrix.  However,  the  resulting  system  of  n  algebraic 
equations  in  n  unknowns  is  nonlinear,  with  the  result  that 
non-unique  solutions  will  exist.  In  fact,  Birkhoff  and 
MacLane  (Ref  176)  show  that  corresponding  to  each  nth  degree 
monic  polynomial 

g (A)  =  An  +  anAn_1  ♦  ...  ♦  a2X  +  at 

an  n  x  n  matrix  having  characteristic  polynomial  g(A)  can  be 
constructed.  This  matrix  is  called  the  companion  matrix  of 


g (A) ,  and  has 

the 

fol lowing 

form : 

0 

1 

0 

0  ... 

0 

0 

0 

1 

0  ... 

0 

0 

0 

0 

1 

0 

0 

0 

0 

• 

• 

0  ... 

1 

'al 

'a2 

-a3 

*  3  .  •  •  • 

4 

-a 

n 

However,  matrices  other  than  the  companion  matrix  can  be 
found  which  also  satisfy  the  characteristic  equation  of  <J> . 

Identification  of  4>,  T,  and  il  Using  a  Canonical  Form  System 
Representation 

The  possible  disadvantages  associated  with  use  of  the 
above  method  for  identifying  the  state  transition  matrix  4> 
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may  be  eliminated  if  one  is  only  interested  in  modeling  the 
output  observations  of  the  system.  That  is,  if  the  particular 
structure  of  4>  is  unimportant,  then  the  system  represented  by 
Eqs  (1)  and  (2)  can  be  transformed  into  a  canonical  or  phase 
variable  form  (Ref  90)  containing  a  fewer  number  of  para¬ 
meters  to  be  identified.  Consider  the  following  linear 
transformation : 


X* (k)  =  TX(k)  (47) 

t 


where  T  is  a 
state  vector 
suggests  the 


constant  n  x  n  matrix  that  maps  the  n  x  1 
X(k)  into  the  n  x  1  vector  X*(k).  Mehra 
following  definition  for  T: 


H 

H4> 


(48) 


Using  the  assumption  of  complete  observability,  viz.,  Eq  (11), 
it  follows  that  T  is  also  nonsingular.  Premultiplying  Eq  (1) 
by  T  and  using  (47),  Eqs  (1)  and  (2)  become,  respectively 

X*  (k*l )  =  4>*X*(k)  ♦  T*U(k)  (49) 


Z (k)  =  H*X*(k)  +  V (k) 


(50) 


GSA/MA/ 72-5 


where 


<t>*  =  T$T 


-1 


(SI) 


ir 


(52) 


H*  =  HT 


(53) 


Equations  (49)  and  (50)  represent  the  original  system 
in  canonical  or  phase  variable  form,  with  4>*  ,  T* ,  and  H* 
derived  as  follows.  Postmultiplying  Eq  (48)  by  <t,  and  using 
Eq  (34)  with  j  =  1  results  in 


T*  = 


H4> 


H4> 


H<t 


=  !i 


(54) 


from  which 


T"1  =  OB'1 


(55) 


Substituting  Eq  (55)  into  (51)  yields 


<t>*  =  T^C'frB*1) 


t4,2b”  1 


(56) 
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An  expression  for  4>*  that  is  independent  of  T  may  be  obtained 
by  substituting  Eq  (54)  into  (56),  and  using  Eq  (34)  with 
j  *  1: 

t*  =  (T4>)  4>b”  1 


w;;1 


H4> 


n-  1 


4>B 


-1 


H4> 


H4> 


2  - 1 
H*  B. 

—  -  1 


H4>n  bT  1 

••N  *•  1 


—  -s.  —  X 


But,  from  Eq  (54)  it  is  clear  that 


T-?l 


(57) 


— 

— 

h^b'1 

1 

0 

0  ... 

0 

2  - 1 

H4>  Bj 

0 

1 

0  ... 

0 

• 

• 

II 

i 

II 

• 

,, .  n  „  -  1 

H4>  B  j 

0 

0 

0  ... 

1 

— 

— 

(58) 


1 


A 
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Therefore,  using  Eq  (58)  to  solve  for  the  first  n  -  1  rows 
of  4>  *  in  the  representation  given  by  Eq  (57),  and  Eq  (39) 
with  j  =  1  for  the  last  row  of  4>* ,  it  follows  that  Eq  (57) 
can  be  rewritten  as 


0  0  1  ...  0 


(59) 


0  0  0 


a 


1 


-a. 


1 


-a 

n 


It  should  be  noted  in  passing  that  4>*  ,  the  canonical  or 
phase  variable  representation  of  the  state  transition  matrix 
,  is  precisely  the  companion  matrix  of  the  characteristic 
polynomial  of  ♦  that  was  discussed  in  the  previous  section. 
Furthermore,  other  representations  of  'S'  exist,  since  the 
transformation  given  by  Eq  (51)  identifies  $  only  to  within 
a  similarity  transformation  (Ref  176). 

Continuing,  an  expression  for  H*  can  be  obtained  by 
substituting  Eq  (55)  into  (53),  yielding 

H*  =  H<S>B  j  1 

From  Eq  (58),  however,  H$B  *  is  seen  to  be  precisely  the 
first  row  of  the  n  x  n  identity  matrix.  Thus,  H*  may  be 
written  as 


H*  =  [1,  0,  0,  ....  0] 


(60) 
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Finally,  it  remains  to  determine  T* .  Using  the  canonical 
form  system  representation  given  by  Eqs  (45)  and  (50),  and 
denoting  the  covariance  matrix  of  X*(*)  by  P* ,  it  is  easily 
shown  that  the  following  equation,  analogous  to  Eq  (30), 
can  be  derived: 


p*  =  *  *  p  *  <j>  * «  +  r*Qp*  • 


(61) 


Assuming  for  the  moment  that  the  input  noise  covariance 
matrix  Q  is  known  Eq  (61)  still  cannot  be  used  to  solve 
for  T*  because  of  the  unknown  matrix  P*.  However,  an 
auxilary  equation  for  P*  may  be  obtained  as  follows.  Once 
again  starting  with  Eqs  (49)  and  (50)  ,  it  is  straightforward 
to  show  that  the  following  equation,  analogous  to  Eq  (34) 
with  j  =  1,  can  be  obtained: 


5i  = 


H  *  4>  * 

H  *  (  $  *  )  2 


H* ($*) 


(62) 


Equations  (54)  and  (56),  however,  imply  that 


($*)2  =  T4>2b  j  1  (T<t>)  $B  j  1 


I*3*;1 
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and  by  induction 

(4*)1  =  T$1  +  XB j 1  (63) 

Substituting  Eqs  (53)  and  (63)  with  i  =  1,  ...»  n  into 
Eq  (62),  and  using  Eq  (57)  results  in 


=  ♦  *  (64) 

Now,  using  Eqs  (49)  and  (50)  to  derive  an  equation  analogous 
to  Eq  (35)  with  j  ■  1,  i.e.. 


P*H* '  =  (BJ) 


and  substituting  for  B*  from  Eq  (64),  the  following  auxiliary 
equation  for  P*  is  obtained: 
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Thus  when  Q  is  known,  Eqs  (61)  and  (65)  may  be  used  to  solve 
for  T*  and  P* .  Mehra  (Ref  181)  presents  an  iterative 
procedure  for  solving  equations  of  the  form  of  Eqs  (61) 
and  (65)  for  the  case  of  Q  equal  to  the  identity  matrix. 
Identification  of  the  covariance  matrices  Q  and  R  is  dis¬ 
cussed  in  the  next  section. 

One  of  the  advantages  of  using  the  canonical  form 
system  representation  discussed  above  is  that  the  number  of 
unknown  parameters  required  to  identify  the  various  system 
matrices  has  been  reduced.  In  the  original  system  repre¬ 
sentation  given  by  Eqs  (1)  and  (2),  there  are  a  total  of 
2 

n  ♦  np  +  n  unknown  parameters  required  to  identify  4> ,  Y, 
and  H.  On  the  other  hand,  in  the  canonical  form  representa- 
tion  given  by  Eqs  (49)  and  (50)  ,  there  are  only  a  total  of 
n  ♦  np  unknown  parameters  required  to  identify  4>* ,  T* ,  and 
H* .  Hence,  use  of  the  canonical  or  phase  variable  system 
representation  has  significantly  reduced  the  total  number 
of  unknown  parameters  requiring  identification. 

Identification  of  Q  and  R 

In  the  system  representation  given  by  Eqs  (1)  and  (2), 
the  covariance  matrix  of  the  input  gaussian  white-noise 
sequence  U(k),  having  disturbance  transition  matrix  T,  is 
assumed  to  be  a  symmetric  positive  definite  matrix  Q.  In 
this  section  it  is  shown  that  no  generality  is  lost  in  the 
system  model  if  the  covariance  of  U(k)  is  assumed  to  be  the 
identity  matrix,  provided  that  F  is  adjusted  accordingly. 
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The  development  begins  by  considering  the  following 
well  known  result  from  matrix  algebra  (Ref  176):  if  a  p  x  p 
symmetric  matrix  Q  is  positive  definite,  then  there  exists  a 
nonsingular  p  x  p  matrix  F  such  that 

Q  =  F • F  (66) 

Substituting  Eq  (66)  into  (30),  the  following  result  is 
obtained : 


p  =  4>p<j>'  ♦  rF'Fr’ 

=  4>P<t>*  ♦  (TF  •  )  CTF  •  )  •  (67) 

Letting  the  n  x  p  matrix  Te  S  TF'  be  an  equivalent  disturb¬ 
ance  transition  matrix,  then  Eq  (67)  implies  that  Eq  (1)  of 
the  original  system  model  representation  may  be  rewritten  as 

X  (k+ 1 )  =  4>X(k)  ♦  T  U  (k)  (68) 

In  Eq  (68),  Ue(k)  is  a  p  x  1  vector  of  equivalent  gaussian 
white-noise  having  the  following  properties: 

E[ue(k)]  =  o 

E[yc(j)u;(k))  =  i6. k 

and  I  is  the  p  x  p  identity  matrix. 

Conceptually,  in  arriving  at  Eq  (68),  the  input  dis¬ 
turbance  matrix  T  is  adjusted  in  such  a  way  (viz.,  TF)  that 
the  gaussian  whitc-noisc  input  sequence  { II  ( k )  ,  k  =  0,  1,  ...} 
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appears  to  have  its  covariance  equal  to  the  identity  matrix. 
That  is,  the  original  input  model  TU(k)  can  be  thought  of 
as  being  replaced  by  an  equivalent  input  model  F  U  (k)  . 

**  6  **  C 

Statistically,  however,  the  two  input  models  are  equivalent 
with  respect  to  the  state  vector  X(k).  Therefore,  no  loss 
in  generality  occurs  in  the  system  model  given  by  Eqs  (1) 
and  (2)  if  Q  is  replaced  by  I  in  Eq  (4)  : 

E[U(j)U'(k)]  =  16..  (69) 


The  only  parameter  remaining  to  be  identified  in  the 
system  representation  given  by  Eqs  (1)  and  (2)  is  R,  the 
covariance  of  the  gaussian  white-noise  measurement  error. 

A  method  that  yields  asymptotically  unbiased,  normal,  and 
consistent  estimates  of  R  is  suggested  by  Mehra  (Ref  78). 
The  method  is  based  on  the  autocorrelation  function  of  the 
output  measurements  when  k  =  0.  Substituting  Eq  (35)  with 
j  *  1  into  Eq  (32)  with  k  =  0  yields 


co  ■ 


+  R 


(70) 


C 

n 


Premultiplying  Eq  (37)  by  H  and  solving  the  resulting 
equation  for  a^H  results  in 
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al5  *  '“Vll1? 


-  a.HG  - 

£  —  •* 


*  -la2’  a3 ’ 


a„.i> 


H<J> 


H4>‘ 


H$‘ 


(71) 


where  &n+j  =  1.  Substituting  Eq  (34)  with  3=1  into  Eq  (71) 
and  solving  the  resulting  equation  for  yields 


si;1 


aj  ^ a2  *  a3  *  * 


a  J 

n+1 


(72) 


Substituting  Eq  (72)  into  (70),  the  following  result  is 
obtained : 


a. C_  =  -  y  a.^.C.  +  a.  R 
1  0  J+1  1  1 


from  which  the  indicated  estimate  of  R  is  found  to  be 


R  ■  - —  l  a.  .C. 

ax  5=0  >+1  ^ 


(73) 


The  Cj  in  Eq  (73)  can  be  calculated  using  Eq  (43),  and  the 
a^  can  be  calculated  using  Eq  (44). 

With  respect  to  the  statistical  properties'  of  the  state 
vector  X(k),  it  was  shown  earlier  in  this  chapter  that  no 
loss  in  generality  occurs  in  the  system  model  if  the  co- 
variance  of  the  input  noise  sequence  is  assumed  to  be  the 


60 


GSA/MA/72-S 


identity  matrix.  However,  a  major  problem  exists  if  Q  must 
be  identified  explicitly.  The  crux  of  the  problem  is  that 
even  though  the  existence  of  a  matrix  F  that  satisfies 
Eq  (66)  is  guaranteed  when  Q  is  symmetric  and  positive 
definite,  F  must  still  be  identified  in  order  to  calculate 
Q.  One  possible  way  in  which  the  problem  of  identifying 
Q  and  R  may  be  approached  is  through  adaptive  Kalman 
filtering  (Ref  79).  Although  identification  of  the  noise 
covariances  will  not  be  pursued  further  in  this  paper, 

Kalman  filtering  will  be  used  in  the  next  chapter  to  develop 
another  identification  method. 
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V .  System  Identification  Using 
Adaptive  Kalman  Fi ltering 

Parameter  identification  using  two  system  representa¬ 
tions  was  discussed  in  the  preceding  chapter.  It  was  shown 
that  distinct  advantages  could  be  obtained  by  using  a 
canonical  form  system  representation  rather  than  the  original 
system  representation.  In  this  chapter  another  model  of 
the  system,  called  "Levy's  proper  canonical  form"  (Ref  74), 
is  developed.  This  model  is  shown  to  offer  yet  additional 
advantages  in  solving  the  identification  problem. 

Levy's  proper  canonical  form  is  based  on  the  Kalman 
filter  representation  of  the  system  (Ref  101),  and  is  derived 
using  the  "innovations  sequence"  of  the  filter  (Ref  76). 
Hence,  the  chapter  begins  with  a  definition  of  the  innova¬ 
tions  sequence,  and  proceeds  to  a  development  of  some  of  its 
properties.  Using  these  results,  the  optimal  filtered  state 
estimate  is  derived,  and  combined  with  certain  results  from 
optimal  estimation  theory  to  yield  the  desired  canonical 
form . 

The  Innovat ions  Sequence 

The  innovations  approach  to  linear  1  east  -  squares  esti¬ 
mation  is  discussed  by  Kailath  (Ref  76).  Basically,  the 
innovations  approach  is  to  use  a  causal  and  causally 
invertible  linear  transformation  to  first  "whiten"  the 
observed  output  data,  i.e.,  convert  the  observed  output  to 
a  white-noise  sequence.  The  reason  for  doing  this  is  that 
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with  white-noise  observations,  the  estimation  problem  is 
greatly  simplified.  Then,  once  the  solution  to  this  simpli¬ 
fied  problem  is  obtained,  the  inverse  of  the  original 
"whitening"  filter  can  be  used  to  express  the  solution  in 
terms  of  the  original  output  observations. 

For  the  convenience  of  the  reader,  at  this  point  the 
basic  system  model  for  the  vector-input  scalar-output  case 
is  restated,  and  certain  additional  terminology  is  introduced 
(Ref  100) : 


X (k+ 1 )  =  $X(k) 

♦  ru(k) 

(74) 

Z(k)  =  Y  (k) 

♦  V(k) 

(75) 

Y (k)  =  HX (k) 

(76) 

where  Y(k)  is  the  output  message ,  V(k)  is  the  output 
measurement  noise ,  and  Z(k)  is  the  output  s i gn  a  1  (message 
plus  noise).  Using  Eqs  (16)  and  (17),  the  conditional  mean 
and  estimation  error  of  the  message  sequence  are,  respectively 

Y  (k  |  k - 1 )  =  K[Y(k)|Z(j),  0  <  j  <  k-1]  (77) 

Y  (k  |  k-  1 )  =  Y  (k )  -  Y ( k  |  k - 1 )  (78) 

The  innovations  sequence  of  Z(*),  denoted  by  a(-),  is 
defined  by  Kailath  (Ref  76)  to  be  the  difference  between  the 
output  signal  and  the  conditional  mean  of  the  message 

Substituting  tiqs  (75)  and  (78)  into  this  difference 
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yields 

a(k)  =  Z  (k)  -  Y (k | k- 1)  =  Y(k)  ♦  V(k)  -  Y(k|k-1) 

=  Y  C k | k -  1 )  ♦  V  (k)  (79) 

Another  interpretation  of  the  innovations  sequence  «(•)  may 
be  deduced  as  follows.  Taking  the  conditional  expectation 
of  Eq  (75)  given  Z(j),  0  <  j  <  k  -  1,  results  in 

Z ( k | k - 1 )  =  Y (k | k- 1 )  +  V (k | k- 1 )  (80) 

From  Eq  (15),  however,  it  is  seen  that  V(k)  is  independent 
of  Z(j),  0  <  j  <  k  -  1.  That  is,  future  measurement  noise 
is  independent  of  past  output  signals,  and  hence 

V  (k | k- 1)  =  E [V (k) |Z(j) ,  0  <  j  <  k  -  1] 

=  E [V (k)  ] 

=  0  (81) 

Therefore  substituting  Eq  (81)  into  (80)  yields 

Z  ( k | k -  1 )  =  Y  (k | k-1)  (82) 

after  which  Eq  (79)  becomes 

o(k)  =  Z (k )  -  Z  ( k  |  k -  1 )  =  Z ( k | k - 1 ) 

The  previous  equation  indicates  that  the  innovations  sequence 
a(k)  may  be  regarded  as  the  estimation  error  of  the  output 
signal,  i.e.,  the  difference  between  the  output  at  time  k  and 
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the  optimal  estimate  of  the  output  at  time  k  given  all 
previous  observations  of  the  output.  This  difference  is 
precisely  the  "new  information"  brought  by  the  latest  output 
observation  Z (k) . 

Properties  of  the  Innovations  Sequence 

The  following  properties  of  the  innovations  sequence 
o(- )  are  discussed  by  Kailath  (Ref  76)  and  Mehra  (Ref  79): 

1.  The  stochastic  process  defined  by  the  innovations 
sequence  (a(k) ,  k  *  0,  1,  ...}  is  a  zero-mean  Gauss-Markov 
sequence.  This  property  follows  from  the  fact  that  by  using 
property  2c  of  the  system  model  and  Eq  (76),  Y  ( k | k - 1 )  is 
seen  to  be  a  linear  combination  of  the  available  Z(s), 

0  <  s  <  k  -  1.  Thus,  Eq  (79)  indicates  that  a(k)  is  also  a 
linear  combination  of  the  available  Z(s),  0  <  s  <  k.  It 
follows,  therefore,  that  since  the  Z(-)  form  a  zero-mean 
Gauss-Markov  sequence  (property  la  of  the  system  model),  the 
a(*)  also  form  a  zero-mean  Gauss-Markov  sequence. 

2.  The  innovations  sequence  a ( • )  is  a  white-noise 
sequence.  This  property  may  be  established  by  using  Eq  (79) 
and  considering  the  covariance  of  the  innovations  sequence: 

E[a(j)o(k)]  =  E{[Y(jJj-l)+V(j)][Y(k|k-l)*V(k))} 

=  E[Y(j | j -  1) Y  (k  |  k-  1) ]  ♦  E (Y ( j | j -  1 ) V (k) ] 

♦  E [V ( j ) Y (k | k- 1 ) J  ♦  E [V( j ) V (k) ]  (83) 

However,  using  Eqs  (14),  (17),  and  (76),  it  follows  that 
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E[Y(j| j-l)V(k)]  =  HE[X(j| j-l)V(k)] 

=  HE [X ( j ) V  (k) ]  -  HE[X(j | j-l)V(k)] 

*  -HE [X  C j  | j-l)V(k)]  ,  k  >  j 

Using  property  2c  of  the  system  model  and  Eq  (15),  the  last 
equation  becomes 

E[Y(j | j-l)V(k)]  =0,  k  >  j  (84) 

Also,  using  Eqs  (19),  (76),  and  (78),  it  follows  that 

E [Y( j | j -1) Y(k | k-1) ]  =  E [Y ( j ) Y (k | k-  1 ) ]  -  E [Y ( j | j -  1 ) Y  (k | k- 1 )  ] 

=  E [Y ( j ) Y (k | k-  1)  ]  ,  k  >  j  (85) 

Therefore,  substituting  Eqs  (84)  and  (85)  into  (83),  and 
using  Eqs  (20)  and  (75),  the  following  result  is  obtained; 

E [a ( j ) a (k) ]  =  E [Y (j ) Y(k | k- 1) ]  +  E [V( j ) Y(k | k- 1 ) ] 

♦  E  [V  ( j  )  V  (k)  ]  ,  k  >  j 
=  E  [Z  (  j  )  Y  (  k  |  k  -  1 )  J  ♦  E[V(j)V(k)] 

=  E[V(j)V(k)J ,  k  >  j  (86) 

By  a  similar  development  it  can  be  shown  that  Eq  (86)  also 
holds  for  k  <  j.  Therefore,  substituting  F.q  (6)  into  (86) 
yields 
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E(a(j)a(k)]  =0,  j  ^  k  (87) 

The  variance  of  the  innovations  sequence  may  be  obtained 
by  substituting  j  =  k  into  Eq  (83),  and  using  Eqs  (76)  and 
(78): 

E [a(k)a(k) ]  =  E (V(k) V (k) ]  +  2E [ Y (k | k- 1) V (k) ] 

♦  E[Y(k|k-l)Y(k|k-l)] 

=  E [V(k) V (k)  ]  +  2HE [X (k) V (k) ] 

-  2HE [X(k | k-l)V(k) ]  ♦  E [ Y (k | k -  1 ) Y (k | k -  1 )  ]  (88) 

Using  Eqs  (14)  and  (15),  and  property  2c  of  the  system  model, 
Eq  (88)  becomes 

E[o(k)o(k)l  =  E[V(k)V(k)l  +  E [Y (k | k- 1 ) Y (k | k-  1 )  ]  (89) 

Defining  the  steady  state  variance  of  the  estimation  error 
of  the  message  sequence  as 

Py  =  E[Y  (k|k  —  1)Y  (k|k  —  1) )  ,  V  k  (90) 

and  using  Eq  (6),  the  following  result  is  obtained  from 
Eq  (89)  for  the  variance  of  the  innovations  sequence: 

E (a (k)a(k) )  =  R  ♦  Py,  j  =  k  (91) 

Hence,  combining  Eqs  (87)  and  (91),  the  var i ance- covar i anc e 
matrix  of  the  innovations  sequence  may  be  written  as 
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E[o(j)a(k)J  =  (R  ♦  Py)6jk  (92) 

A  comparison  of  Eqs  (6)  and  (92)  reveals  that  the  innovations 
«(•)  is  a  white-noise  sequence  like  the  measurement  noise 
V(*)>  but  with  a  different  variance. 

3.  For  causal  linear  operations  (i.e.,  linear  operations 
that  do  not  require  the  future  value  of  one  variable  to 
determine  the  current  value  of  another  variable),  a(-)  and 
Z ( • )  are  "statistically  equivalent"  and  may  be  obtained  from 
one  another.  For  the  case  where  Z(j),  0  <  j  <  k,  is  known, 
then  Y  C  k | k - 1 )  can  be  calculated  using  Eq  (82).  Hence,  from 
Eq  (79),  it  is  seen  that  a(k)  is  completely  determined  by 
Z(j)  ,  0  <  j  <  k.  For  the  case  where  ct(.)  is  known,  Z(*)  may 
be  obtained  by  using  the  celebrated  Kalman-Bucy  formula,  and 
the  proof  is  given  in  Ref  76. 

Levy 1 s  Proper  Canonical  Form 

In  this  section.  Levy's  proper  canonical  form  representa¬ 
tion  of  the  system  is  derived.  This  representation  is  based 
on  the  Kalman  filter  for  the  system,  and  is  discussed  by 
Geesey  and  Kailath  (Ref  74)  and  Mchra  (Refs  78  and  79). 
Preparatory  to  the  formal  derivation  that  follows,  however, 
a  preliminary  result  is  stated  in  the  form  of  the  following 
theorem  (Ref  76)  : 

Theorem  2^:  The  Projection  Theorem 

In  the  original  system  representation  given  by  Eqs  (74) 
through  (76),  the  best  estimate  of  the  message  error  Y  ( k  |  k  -  1 ) 
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is  unique  and  satisfies  the  following  condition: 

Y (k | k- 1)  =  Y(k)  -  Y (k | k- 1 )  J_  Z  (s)  ,  0  <  s  <  k  -  1  (93) 

where 


Y _j_  Z  means  E[YZ]  =  0  (94) 

In  words,  the  projection  theorem  states  that  the  instantan¬ 
eous  message  error  is  uncorrelated  with  the  output  signal. 

The  first  step  in  the  derivation  of  Levy's  proper 
canonical  form  involves  the  development  of  a  useful  result 
from  optimal  estimation  theory.  Meditch  (Ref  180)  shows 
that  any  linear  estimate  of  the  state  X(k)  given  the  set  of 
output  measurements  {Z(s)  =  Y(s)  +  V(s),  0  <  s  <  k)  can  be 
written  as 

k 

X (k | k)  =  l  G (k , s ) Z  (s  )  (95) 

s  =  0  ~ 


where  the  G(k,s)  are  n  x  1  vectors.  By  property  3  of  the 
innovations  sequence,  a(-)  and  Z(-)  are  statistically 
equivalent  for  causal  linear  operations,  and  therefore 
Eq  (95)  implies  that 


k 

X  (k  |  k)  =-  l  G(k,s)o(s)  (90) 

s  =  0  ~ 

In  Eq  (96),  the  vector  G ( k , • )  acts  as  a  linear  filter  that 
is  selected  such  that  the  instantaneous  error  in  measuring 
the  state  vector  is  independent  of  the  output  signal,  and 
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hence  is  also  independent  of  the  innovations.  That  is, 

X  (k  |  k)  =  X(k)  -  X  (k  |  k )  _|_  a  ( s )  ,  0  <  s  <  k  (97) 

Postnmltiplying  Eq  (97)  by  a(s)  ,  taking  the  expected  value 
of  the  result,  and  using  the  projection  theorem  results  in 

E  [X  (k  |  k)a  (s)  ]  =  0  =  E[X(k)ct(s)]  -  E  [X(k  |  k)a(s)  ] 


or 


E (X (k) a (s ) ]  =  E [X (k | k)a(s) ] 


(98) 


Substituting  Eq  (96)  into  (98) ,  and  using  Eq  (92)  yields 


E[X(k)a(s)] 


E 


k 

l  G  (k  ,  a)  a  (a)  a  (s ) 
o  =  0 


(99) 


k 

=  l  G(k,o)E[a(o)a(s)]  (100) 

a=o 

=  G(k,s) (R  ♦  Py) ,  0  <  s  <  k  (101) 

Equation  (101)  says  that  in  the  steady  state,  the 
correlation  function  between  the  state  vector  and  the  output 
signal  is  proportional  to  the  linear  filter  G(k,*)*  Since 
in  the  scalar-output  case,  R  and  Py  are  both  constant  scalars, 
the  existence  or  physical  realizability  of  G  ( k  ,  • )  is  assured. 
Hence,  from  Lq  (101)  it  follows  that 

G(k,s)  =  E[X(k)a(s) } (R  ♦  Py)'1  (102) 
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Postmultiplying  Eq  (102)  by  a(s),  summing  the  result  from 
0  to  k,  and  using  Eq  (96)  yields 

X(k|k)  =  l  (E[X(k)a(s)l(R  +  PY)'1a(s)}  (103) 

which  implies  that 

~  k+1  . 

X(k+1 | k+1)  =  l  (E[X(k+l)a(s)J (R  ♦  Pv)  a(s)} 
s  =  0  ~  1 

k 

=  l  (E[X(k+l)ct(s)]  (R  ♦  P  )"  a(s)} 
s  =  0  ~  1 

♦  E[X(k+l)a(k+l)J  (R  +  Pyj'^tk  +  l)  (104) 

If  the  steady  state  Kalman  filter  gain  is  now  defined  as  the 
n  x  1  vector  K,  where 

K  =  F.[X(.)a(.))(R  ♦  V"1 

then  Eq  (104)  becomes 

X(k+1 | k ♦ 1 )  =  1  { E [ X (k+ 1 ) a  ( s ) ] (R  +  P  )_1a(s)}  ♦  Ka(k+1) 

s  =  0 

Substituting  Eq  (74)  into  the  last  equation  yields 
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X(k*l 1  k+1)  =  l  {E[*X(k)a(s)*ru(k)ct(s)]  (R  +  P  J^afs)} 
s  =  0 

+  Ka (k  + 1 ) 

k  . 

=  4>  l  (E[X(k)a(s)l (R  ♦  PY)  a(s)} 

~  s  =  0 

k  i 

♦  T  l  (E[U(k)a(s)]  (R  +  P Y)  a(s)}+Ka(k+l)  (105) 

~  s  =  0 

However,  using  property  2c  of  the  system  model,  and  Eqs  (13), 
(76),  and  (79),  it  follows  that 

k  k 

l  (E[U(k)o(s)]}  =  l  (E(U(k)Z(s)]  -  E[U(k)Y(s|s-l)]} 
s=0  ~  s=0 

=  0  (106) 

Substituting  Eq  (106)  into  (105),  and  using  Eq  (103)  yields 
the  following  result: 

X  (k+  1  |k+l)  =  <*>  X  ( k  j  k )  +  Ka  (k+ 1 )  (107) 

For  convenience,  Eq  (18)  is  rewritten  below: 

X  (k  +  1  |  k)  =  $>X  ( k  |  k)  (108) 

The  optimal  filtered  estimate  of  the  state  X(k+1)  is 
given  by  Eq  (107),  while  the  optimal  predicted  estimate  is 
given  by  Eq  (108).  Together,  Eqs  (107)  and  (108)  constitute 
a  partial  description  of  the  steady  state  Kalman  filter  for 
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the  system  given  by  Eqs  (74)  through  (76).  In  recent  years 
the  Kalman  filter  has  enjoyed  wide  applicability  in  the  field 
of  optimal  control,  principally  because  its  recursive  form 
is  ideally  suited  for  implementation  using  digital  computers. 
For  a  complete  mathematical  description  of  the  Kalman  filter, 
it  is  also  necessary  to  determine  the  optimal  filtering  gain 
and  the  optimal  covariances  of  the  filtered  and  predicted 
estimates.  The  interested  reader  is  referred  to  the  litera¬ 
ture  for  details  (Refs  101  and  180). 

The  derivation  of  Levy's  proper  canonical  form  continues 
by  rewriting  Lq  (107)  as 

X  (k+1 |  k+  1 )  =  X  (k+  1  i k )  +  Ka(k+1) 
which  implies  that 

X (k | k)  =  X (k | k -  1 )  +  Ka (k )  (109) 

Substituting  Eq  (109)  into  (108)  yields 

X (k+1 I k)  =  $[X(k|k-l)  +  Kct(k)]  (110) 

Premultiplying  Eq  (110)  by  T,  where  T  is  defined  in  Eq  (48), 
and  using  the  linear  transformation  given  by  Eq  (47),  the 
following  form  is  obtained  for  Lq  (110): 

X*  (k  +  1 1  k)  =  4>*[X*(k|k-l)  +  K*a(k)j  (111) 

where  <&*  is  defined  in  Lq  (51),  and 

K*  =  TK  (112) 
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Solving  for  Z(k)  fro.  Eq  (79)  and  using  Eq  (76)  results  in 

Z(k)  -  Y (k | k- 1)  ♦  <*(k) 

=  MX (k | k- 1 )  ♦  a(^) 

Usl„g  the  transformation  given  by  Eqs  (47,  and  (48,.  the  iast 
equation  becomes 

Z(k)  -  H*X*(k|k-l,  4  »(k>  (>18) 

where  H*  is  defined  in  Eq  (53,. 

Taken  together,  Eqs  (111)  and  (113)  comprise  Levy’s 

proper  canonical  form,  which  is  shown  in  .lock  diagram 
representation  in  Fig.  5.  Levy’s  proper  canonical  form 
representation  has  the  following  advantages  over  the 
canonical  form  representation  given  by  Lqs  (49,  and  (50,: 

1.  The  optimal  filtered  and  predicted  estimates  of 

the  state  vector  are  chained  directly. 


Fic.  5.  Model  of  Levy's  Prop  r  Canonical  Form 
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2.  The  innovations  sequence  a(k)  is  obtained  directly 
from  the  output  measurements  Z(k)  and  the  optimal  predicted 
estimate  of  the  state  vector  X*(k|k-1),  where  X*(k|k-1)  can 
be  realized  using  the  feedback  design  shown  in  Fig.  5. 
Availability  of  the  innovations  sequence  enables  a  check  to 
be  performed  concerning  the  optimality  of  a  Kalman  filter 
constructed  using  parameter  estimates,  as  well  as  providing 
information  pertaining  to  the  statistical  quality  of  the 
parameter  estimates.  These  tests  are  discussed  by  Mehra 
(Ref  79) . 

3.  For  vector-input  scalar-output  systems.  Levy's 
proper  canonical  form  representation  effects  a  net  reduction 
in  the  number  of  unknown  parameters  to  be  estimated,  i.e., 

K*  contains  n  unknown  elements,  whereas  T*  contains  np 
unknown  elements. 

Identification  of  K* 

Although  distinct  advantages  may  be  gained  by  using 
Levy's  proper  canonical  form  over  other  system  representa¬ 
tions,  a  problem  nevertheless  exists  in  that  the  optimal 
gain  K*  remains  to  be  estimated.  In  this  section,  an 
algorithm  for  estimating  K*  is  discussed  (Ref  78).  The 
algorithm  is  based  on  the  innovations  sequence  a  ( • ) »  and  i s 
capable  of  being  implemented  on-line. 

As  pointed  out  in  the  preceding  section,  a  complete 
mathematical  description  of  the  Kalman  filter,  and  hence  of 
Levy's  proper  canonical  form,  requires  knowledge  of  the 
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optimal  filter  gain  and  the  optimal  covariance  of  the  error 
in  estimating  the  filtered  and  predicted  states.  When  all 
of  the  parameters  in  the  original  system  formulation  (viz., 

$,  T,  H,  Q,  and  R)  are  known,  expressions  for  determining 
the  optimal  gain  and  covariances  in  the  Kalman  filter  are 
well  documented  in  the  literature  (Refs  101  and  180). 

However,  when  the  system  parameters  are  initially  unknown 
and  require  estimation,  a  Kalman  filter  constructed  using 
these  parameter  estimates  may  not  be  optimal.  In  cases 
where  the  parameter  identification  is  particularly  poor, 
the  Kalman  filter  may  become  unstable  and  even  diverge 
(Refs  95  and  97).  On  the  other  hand,  when  the  parameter 
estimates  are  good,  the  Kalman  filter  will  be  stable  and 
the  state  vector  will  converge  to  its  true  value. 

Mehra  (Ref  79)  shows  that  a  necessary  and  sufficient 
condition  for  the  optimality  of  a  Kalman  filter  is  that  the 
innovations  sequence  a(*)  be  white.  This  condition  forms 
the  basis  for  the  algorithm  that  estimates  the  optimal  filter 
gain  K* ,  for  when  the  innovations  sequence  is  white,  the 
corresponding  value  of  the  filter  gain  is  optimal.  V^ry 
basically,  the  algorithm  begins  by  arbitrarily  assuming  an 
initial  value  of  the  filter  gain  Kq  and  filtering  the  output 
observations  (Z(0),  ...  Z(n)}  using  Eqs  (111)  and  (115). 

Next,  the  innovations  sequence,  which  Mehra  shows  to  be 
stationary  under  steady  state  conditions,  is  aut ocor r c 1  a t ed 
and  tested  for  whiteness.  If  the  innovations  sequence  is 
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not  white,  the  filter  gain  is  changed  to  K  ,  where  K,  is 
determined  by  a  recursive  relationship  involving  K^,  $>* ,  H* , 
and  the  normalized  autocorrelations  of  the  innovations 
sequence.  The  above  procedure  is  repeated  until  the  sequence 
o(-)  is  white.  The  sequence  of  filter  gains,  K^,  K^,  ...»  is 
shown  to  converge  to  the  optimal  value  K*  in  the  mean-square 


sense . 
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VI.  Conclusions  and  Recommendations 
for  Further  Study 

Conclusions 

This  thesis  considered  the  problem  of  the  identifica¬ 
tion  of  linear  stochastic  systems.  A  basically  non-technical 
state-of-the-art  assessment  of  the  subject  area  was  made,  and 
criteria  for  the  classification  and  selection  of  identifica¬ 
tion  methods  were  presented  and  discussed.  Several  of  the 
more  popular  identification  methods  from  the  literature  were 
investigated  and  summarized.  The  methods  were  shown  to  have 
application  to  such  diverse  fields  as  economics,  industrial 
processes,  and  aircraft  design. 

Using  the  state  variable  formulation  for  a  discrete 
linear  stochastic  system,  a  detailed  exposition  of  a  few  of 
the  on-line  identification  methods  currently  appearing  in 
the  literature  was  presented.  One  such  method,  based  on  the 
autocorrelation  function  of  the  output  measurements,  was 
developed  to  identify  the  state  transition  matrix  and  the 
output  noise  covariance.  It  was  shown  that  a  canonical  or 
phase  variable  system  representation  could  be  used  to  reduce 
the  number  of  unknown  parameters  requiring  identification. 
Finally,  an  on-iine  identification  method  called  Levy's 
proper  canonical  form,  based  on  the  Kalman  filter  representa¬ 
tion  of  the  system,  was  derived  using  the  innovations  sequence 
and  certain  results  from  optimal  estimation  theory.  It  was 
shown  that  this  identification  method  resulted  in  still 
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additional  advantages  over  the  identification  methods 
previously  developed. 

Recommendations  for  Further  Study 

Obviously,  this  thesis  has  not  answered  all  interesting 
questions  concerning  the  identification  of  linear  stochastic 
systems.  A  fruitful  area  for  further  research  is  discussed 
below . 

Regardless  of  whether  the  ultimate  purpose  of  an 
identification  analysis  is  parameter  identification  alone, 
or  for  control  application,  the  degree  of  accuracy  required 
is  an  important  factor  that  can  often  dictate  the  type  of 
identification  method  to  employ.  However,  a  major  problem 
exists  in  this  regard  in  determining  the  degree  of  accuracy 
attainable  using  a  particular  identification  method,  and 
formulating  criteria  for  measuring  that  accuracy.  Unfortu¬ 
nately,  the  literature  of  system  identification  is  far  from 
complete  in  this  area.  Therefore,  it  is  recommended  that  a 
criteria  be  developed  for  measuring  the  accuracy  and 
sensitivity  of  different  identification  methods,  and  that 
such  criteria  be  validated  by  comparing  several  identifica¬ 
tion  methods  using  simulation  techniques. 
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